T - 


Psychometrika 


" VOLUME XXIII—1958 
JANUARY-DECEMBER 


Editorial Council 


Managing Editor:— 


Chairman:—IIanoLD GULLIKSEN 
LYLE V. JONES 


Editors:—Dororuy C. ADKINS Assistant Managing Editor.— 


Pau Horst B. J. WINER 
Editorial Board 
Dororuy C. ADKINS Wm. К. Estes FnEpERIC M. LORD 
Henry E. GARRETT Irvine LORGE 


R. L. ANDERSON 
T. W. ANDERSON 

J. B. CARROLL 

H. S. CONRAD 

С. H. COOMBS 

L. J. CRONBACH 

E. E. CURETON 
Pau S. DWYER 
ALLEN EDWARDS 
Max D. ENGELHART 


Quinn McNemar 
GEORGE A. MILLER 
Wm. G. MOLLENKOPF 
LINCOLN E. Moses 
GEORGE E. NICHOLSON 
M. W. RICHARDSON 
R. L. THORNDIKE 
LEDYARD TUCKER 

D. F. Voraw, JR. 


Leo A. GOODMAN 

Berr F. GREEN 

J. P. GUILFORD 

HAROLD GULLIKSEN 
PauL Horst 

ALSTON S. HOUSEHOLDER 
1лоүр Q. HUMPHREYS 
Truman L. KELLEY 
ALBERT К. KURTZ 


PUBLISHED QUARTERLY 


By THE PSYCHOMETRIC SOCIETY 
AT 1407 SHERWOOD AY AVENUE ae) 
RICHMOND Dd etn un 
1459 Es 38 


А ‚ы - a 
потен Еваны, *** Нваваген | 


BES їз COLLEGE 
s ЭЕ 6 


Psychometrika 


CONTENTS 


GENERAL RESOLUTION OF CORRELATION MATRICES INTO 
COMPONENTS AND ITS UTILIZATION IN MULTIPLE 


AND PARTIAL REGRESSION . .. eg us у... 1 
Јонх А. CnEgAGER 

ERROR OF MEASUREMENT AND THE SENSITIVITY OF А 

TEST OF SIGNIFICANCE .. ass sas o o n n 9 
J. P. SUTCLIFFE 

DETERMINATION OF PARAMETERS OF A FUNCTIONAL 

RELATION BY FACTOR ANALYSIS... . sso s 19 
LEDYARD R Tucker 

THE INCLUSION OF RESPONSE TIMES WITHIN A STO- 
CHASTIC DESCRIPTION OF THE LEARNING BE- 
HAVIOR OF INDIVIDUAL. SUBJECTS. . 25 

R. J. AUDLEY > grit ар ا‎ HEP 
DETERMINING THE DEGREE OFANCONSISTENCY IN A SET 
OF PAIRED COMPARISONS.. . . . . . . > V suy 9. 95 
Hanorp B. GERARD AND HAROLD-N., SHAPIRO . ts 
PROPERTIES OF THE ITEM SCORE MATRIX 96... 4 47 
Anaus G. MACLEAN 
THE COUNSELING ASSIGNMENT PROBLEM .......- 55 
Jor П. Warb, Jn. 

A RETEST METHOD OF STUDYING PARTIAL KNOWLEDGE 
AND OTHER FACTORS INFLUENCING ITEM RE- 
SPONSE . 4 = arepa 5 5 TER AST 67 

Vera T. BROWNLESS AND JOHN А. Keats A 

THE MEASUREMENT OF FUNCTION FLUCTUATION 75 

R. F. Garston 
PREDETERMINATION OF TEST WRIGHTS : 699-994 85 
PAUL J. HOFFMAN 

RULES FOR PREPARATION OF MANUSCRIPTS FOR PSYCHO- 

лкд. N E 93 
NUMBER 1 


VOLUME TWENTY-THREE , MARCH 1958 


Bureau Е D 
ШЫН dni. "sv. Research 


Li E "m COLLEGE 


Aers No. 


— = 


PSYCHOMETRIC MONOGRAPHS 


The issues of this series are: 


THURSTONE, L. L. Primary mental abilities. 


Psychometric Monograph No. 1, $3.00. (Second impression, cloth 
binding.) 


THURSTONE, L. L. AND Tuursrone, THELMA Gwinn. Factorial studies 
of intelligence. 


Psychometric Monograph No. 2, (out, of print). 


Worrte, Daru. Factor analysis to 1940. 
Psychometric Monograph No. 3, (out of print). 


Tuurstonn, L. L. A factorial study of perception. 
Psychometric Monograph No. 4, (out of print). 


FRENCH, Јонх W. The description of aptitude and achievement tests 
in terms of rotated factors. MN. 
Psychometric Monograph No. 5, $4.00. 


"калм, James W. Dimensions of functional psychosis. 
Psychometric Monograph No. 6, $1.50. 


Lorp, FnEDERIC. A theory of test scores. 
Psychometric Monograph No. 7, $2.00. 


Rorr, MERRILL. A factorial study of tests in the perceptual area. 
Psychometric Monograph No. 8, $1.50. 


Orders for Psychometric Monograph No. 1 should be sent to The Uni- 
versity of Chicago Press, 5750 S. Ellis Avenue, Chicago 37, Illinois. Orders 


for No. 5 through No. 8 should be sent to The William Byrd Press, Box 2-W, 
Richmond 5, Virginia. 


a 


T. re 


ET 


Psychometrika 


CONTENTS 
RELIABILITY FOR THE LAW OF COMPARATIVE JUDG- 
MENTS Gls GSR ES Ses s 
HAROLD GULLIKSEN AND Тону W. TUKEY 


AN INTER-BATTERY METHOD OF FACTOR ANALYSIS . . 
LEDYARD R TUCKER 


COMPARATAL DISPERSION, A MEASURE OF ACCURACY OF 
JUDGMENT: Glos gg а FG га о ct. „ш 

f HAROLD GULLIKSEN 
o — OF THE QUARTIMAX METHOD OF ROTATION 
TO THURSTONE'S PRIMARY MENTAL ABILITIES 


STUDY moe» e AUG E RR i9 e EUR 
CHARLES WRIGLEY, Davin R. SAUNDERS, AND JACK О. NEUHAUS 


A DISTINCTION BETWEEN EXACT AND APPROXIMATE 
NONPARAMETRIC METHODS ПАИ РГ 
WILLIAM L. SA wREY 


BOOK REVIEWS ^a 
LEE J. CRONBACH AND GOLDINE С. GLESER. Psychological Tests 
and Personnel Decisions. . . +--+ s t ot t to t 
Review by Bert F. Green, Jr. 


Jon K. Apams. Basic Statistical Concepts. . . «cc 
Review by Вовккт E. MORIN 


HENRY E. Garrerr. Elementary Statistics. >. - ede gis 5. 
Review by Вовквт Е. MORIN 


W. Arren WALLIS AND Harry V. ROBERTS. Statistics: A New 


Approach... tct B. ме 5 8s 
Review by Cuartes L. Woop 


NE J. GoopNow, AND GEORGE A. 


Jerome S. BRUNER, JACQUELI 
Austin. A Study.of Thinking. « «57 000577 
Review by Вовккт GLASER 


Чу Ros Ора „ > „=. 
VOLUME TWENTY-THREE JUNE 1958 NUMBER 2 


[au ireav Ednl. "SY. r1 
i EWID H af «ING COLLEGE 


95 


111 


137 


151 


171 


179 


180 


182 


^ 


Psychometrika 


CONTENTS 


THE VARIMAX CRITERION FOR ANALYTIC ROTATION IN 
FACTOR ANADYSIS > : èma cus 2m а 
Henry F. KAISER 
POWER FUNCTION CHARTS FOR SPECIFICATION OF 
SAMPLE SIZE IN ANALYSIS OF VARIANCE. .... 
LEONARD 8. FELDT AND MOHARRAM W. MAHMOUD 


СА COMPARATIVE STUDY OF THREE METHODS OF 


БЮРА 222» 4*x-x€Ax€X3x44»xxs»x« 
BENJAMIN Frucurer AND EDWIN Novak 
ANALYSIS OF VARIANCE FOR CORRELATED OBSERVATIONS 
RAYMOND О. COLLIER, Jn. 
THURSTONE'S ANALYTICAL METHOD FOR SIMPLE 
STRUCTURE AND А MASS MODIFICATION THEREOF 
ROBERT R. SOKAL 
ATTENUATION AND INTERACTION ............ 
QUINN McNemar 
THE KUDER-RICHARDSON FORMULA (21) AS A SPLIT-HALF 
COEFFICIENT, AND REMARKS ON ITS BASIC 
ASSUMPTION ope Ee 6 © wm mom m mom ола aS 
SAMUEL B. LvEnLY 
THE AVERAGE SPEARMAN RANK CRITERION CORRELA- 
TION WHEN TIES ARE PRESENT ......... 
Epwarp E. Cureton 
NOTE ON “EFFICIENT ESTIMATION AND LOCAL 
IDENTIFICATION IN LATENT CLASS ANALYSIS" . 
Ricuarp B. МеНоси 
BOOK REVIEWS А 
HENRY QuASTLER (Editor). Information Theory in Psychology . 
Review by Joun B. CARROLL 
\\пллАм С. COCHRAN AND GERTRUDE M. C 


Designs. 2nd Edition . +... 
Review by J. R. WrrrENBORN 


ox. Experimental 


ee 
VOLUME TWENTY-THREE SEPTEMBER 1958 NUMBER 3 


! Bureay 
ЖЫ 
| Dated 
] dew No Y 


M 


201 


267 


271 


Psychometrika 


LEES у“ LAU 


CONTENTS 


THE MYSTERY OF THE MISSING CORPUS ........ 279 
FREDERICK MOoSTELLER 
SOME RELATIONS BETWEEN GUTTMAN'S PRINCIPAL 
COMPONENTS OF SCALE ANALYSIS AND OTHER 
PSYCHOMETRIC THEORY -=è srme sommes 291 
FREDERIC M. LORD 
TO WHAT EXTENT CAN COMMUNALITIES REDUCE 


DOO o «ke GREK RES BASS COR NE 297 
Lours GUTTMAN 
A MARKOV MODEL FOR DISCRIMINATION LEARNING. . 309 


Ricuarp C. ATKINSON 
REMARKS ON THE TEST OF SIGNIFICANCE FOR THE 
METHOD OF PAIRED COMPARISONS ....... 323 
R. DannELL Bock 
A COMPARISON OF THE PRECISION OF THREE EXPERI- 
MENTAL DESIGNS EMPLOYING A CONCOMITANT 
VARIABLE =e crt + =з жже та =й 
LEONARD S. FELDT 
AN AXIOMATIC FORMULATION AND GENERALIZATION 
OF SUCCESSIVE INTERVALS SCALING . ....--- 355 
ERNEST ADAMS AND SAMUEL MESSICK 
THE SINGLE LATIN SQUARE DESIGN IN PSYCHOLOGICAL 


e азе ате waa {сё к= к=» "шш 

JOHN GAITO 
A MODIFICATION OF KENDALL'S TAU FOR MEASURING 
ASSOCIATION IN CONTINGENCY TABLES: ыз « = 379 


BERTRAM P. КАКОМ AND Irvine E. ALEXANDER 


BOOK REVIEWS А 
W. GRANT DAHLSTROM (Editors). Basic 


Сковськ S. WELSH AND Ed 
{MPI in Psychology and Medicine . . . · 385 


369 


Readings on the n 
Review by LEE J. Cro 


Рнилр H. DuBois. M ullivariate 
Review by Jon А. CREAGER 
(cont.) 


BER 1958 NUMBER 4 


VOLUME TWENTY-THREE DECEM 


NBACH 
Correlational Analysis . . . < + 386 


Psychometrika 


CONTENTS (Cont.) 


JOHN B. Міхев. Intelligence in the United States 


ы eR û 388 
Review by Susan M. Ervin 
Rosert R. Busy, ROBERT P. ABELSON, AND Ray Hyman. Mathe- 
matics for Psychologists, Examples and Problems . . . . . . 391 
Review by R. Duncan Lucr 
CALVIN 8. HALL AND GARDNER Linpzey. Theories of Personality 391 
Review by D. R. SAUNDERS 
G. Herpan. Language as Choice and Chance... ....... 392 
Review by R. DAnnELL Bock 
MINUTES OF THE 1958 ANNUAL BUSINESS MEETING OF 
THE PSYCHOMETRIC SOCIETY .......... 395 
REPORT OF THE COMMITTEE ON THE RELATIONS BE- 
TWEEN THE PSYCHOMETRIC SOCIETY AND THE 
PSYCHOMETRIC CORPORATION .......... 398 
TREASURER’S REPORT, PSYCHOMETRIC SOCIETY . .. 399 
TREASURER'S REPORT, PSYCHOMETRIC CORPORATION . 400 
INDEX FOR VOLUME 28 . . s smena tee eer 401 


لے 
VOLUME TWENTY-THREE DECEMBER 1958 NUMBER 4‏ 


PSYCHOMETRIKA— VOL, 23, NO. 1 
MARCH, 1958 


GENERAL RESOLUTION OF CORRELATION MATRICES INTO 
COMPONENTS AND ITS UTILIZATION IN MULTIPLE AND 
PARTIAL REGRESSION* 


Joun A. CREAGER 


AIR FORCE PERSONNEL AND TRAINING RESEARCH CENTER 

_ The derivation of multiple and partial regression statistics from 
uniqueness-augmented factor loadings, presented in, the literature for 
orthogonal factor solutions, is generalized to oblique solutions. Amathematical 
rationale for the general case, without restriction to uncorrelated factors, 


is presented. Use of the genera ted with a two-factor, 
seven-variable example. 


1 formulation is illustra: 


The considerable amount of computational time and labor required to 
compute multiple and partial correlation statistics when dealing with large 
test batteries is largely due to the necessity of computing the inverse of an 
nth order correlation matrix when classical procedures are used. Compu- 
tation of multiple regression statistics from factor statistics permits con- 
siderable reduction in time and labor, especially when the number of variables 
is large and the number of factors is small (1, 3, 4]. Once the factorial reduc- 
tion of the correlation matrix has been effected, any or all of the multiple 
and partial correlations or regression weights may be obtained. Furthermore, 
the factor solution may be studied to determine which predictors are most 
likely, when combined, to yield high prediction of a given variable. 

The mathematical foundations and computational techniques for 
obtaining multiple and partial regression statistics have been presented for 
orthogonal factor solutions by Guttman [3], Guttman and Cohen [4], Dwyer 
[1], and Horst [5]. Some of the saving in computational effort is lost by the 
preliminary factor analysis; especially if the centroid method is used with 
computation of residuals after extracting each factor. Dwyer [2] has pre- 
sented an example in which preliminary factoring Was done using the square 
root or diagonal method. The multiple-group method, however, permits 
the extraction of several factors simultaneously and is therefore highly 


efficient. Since the multiple-grouP m al, result in correlated 


ethod will, in gener 1 | 
factors, the solution must either be orthogonalized, which requires appreciable 
additional computation, OF oblique factor statistics must 


be used directly 
to obtain the multiple and partial regression statistics. у \ 
It is the te of this paper to present the mathematical rationale, 
"This report is based on work dons Ai ARDO Project 7702, in su 
тавара and development rogram of the Air Роне d es | and auction а Вора 
eaten кык А eS E pin by or for the United States Government. 


publication, use, and disposal in hole and in p? 


2 PSYCHOMETRIKA 


and to demonstrate, by an illustrative example, the computational schemes 
for obtaining multiple and partial regression statistics from oblique factor 
solutions. 


Fundamental Relations 


Let R be an n X n correlation matrix of n variables with unit di 
Let R be factored, without restriction to uncorrel 
factors and n unique factors, yielding 

(2) а factor structure matrix, S, of order n X f. 

(її) a factor intercorrelation matrix, $, of order r X T 

(iii) a factor pattern matrix, P, of order n X r obtained from Р = Sp", 


(iv) a diagonal matrix, U, of order n, giving the unique factor loadings. 
Then 


(1) R = SP’ + 0°. 


Formula (1) states the fundamental factor theorem in gener. 
where resolution of a correlation matrix is made into common f. 
correlated or uncorrelated, and unique faetors which 
inter se or with the common factors. 

In the subsequent development it is assumed that matrices 72 and U? 
are nonsingular, Let V = U^', and define B = VS and €" = P'V, the unique- 
ness-augmented structure and pattern, respectively. Also let 
(2) Q=I+P'V’S, 


where Q is a Gramian matrix of order and rank r. 


agonals. 
ated factors, into r common 


al terms, 
actors, either 
аге uncorrelated either 


The Inverse of the Intercorrelation Matrix 

The inverse of an intercorrelation matrix, R~ 

terms of oblique factor statistics, Starting with 
both sides by P'V* gives 


; may be expressed in 
(1) and premultiplying 


(3) PPR = P'V'SP' + Pre (P'V*§ + DPF = gp. 
Postmultiplying by At^ gives 

() Рту? = ОР, 

and therefore 

G) QV? = PR”, 


Premultiplying both sides of (5) by S, the factor structure, 
UPR gives 
(6) SQ"P'V* + UR = І. 
Subtracting SQ 'P'V* from both sides and dividing by U? yields 
(7) В = ҮІ = SO PT) = V? = FBO EEF. 


and adding 


JOHN A. CREAGER 3 


Use of (7) requires Q^' which is of order r compared to R^ which is order n. 


Obtaining Regression Statistics 


Standard regression weights to be applied to predictor variables in the 


multiple regression of a given criterion may be obtained in either of two 
red, the Q matrix may 


ways. If partial correlation statistics are not requi 
be developed by (2) using uniqueness-augmented factor statisties for the 
predictors only. Let. this matrix be designated as Q; , where j refers to the 
omitted criterion variable. If Q; is used in (7), the inverse of the predictor 
intercorrelation matrix will be obtained. The desired regression weights 


may then be obtained by 


(8) в = Ё Tes 
where r, is a column vector of validity coefficients of order n X 1, and 8 
esired weights. The multiple correlation coefficient 


is а column vector of the d e multip 
1 the given criterion is given by 


for the set of predictors anc 


(9) 


If regression. weights 
may be obtained directly 


Rj = Br. 
are not required, the multiple correlation coefficient 
from R^! by 


(9a) Ri = Ry 


Tf partial correlations are desired, the inverse of the total correlation matrix, 
including the criterion validities, 18 required. In such a situation the regression 
weights and multiple correlation May be obtained from the Q matrix de- 


veloped from the entire set of variables. The inverse, Вг", 18 computed from 
the Q matrix as indicated by (7), the regression weights are then obtained by 


(10) B- -D R 


1 -1 
where D is a diagonal matrix derived from the diagonal TT 
The multiple correlation coefficients may then be SE. SIDE 
(9). Partial eorrelations holding constant — 2 variables may be obtame y 

-ip-1p-3 
(11) hosce cm p*. 


The Prediction of Factor Scores 
for predicting common factors from 
averse of the predictor 


n factor structure), 


1 weights | | 
¢multiplying the ir 
» (the commo 


js = ТЕ = 7807078. 


The matrix of regressio! 
tests, W, , is obtained from PO 
intercorrelations by factor «yalidities 


(12) w,=R'?S=VU- sary’ 


4 PSYCHOMETRIKA 


Similarly, the matrix of regression weights for predieting unique factor 
scores, W, , is 


(13) W. = RU = [V° — VBQ"C'V] U = Y — VBQ^C". 


The corresponding squared multiple correlation coefficients may then be 
obtained as the product sum of regression weights and validities. 

In a situation in which only the multiple correlation coefficient, for 
predieting a common factor from test scores is desired, and the regression 
weights are not needed for a prediction equation, it may be obtained very 
readily without computation of R^' or the regression weights. The multiple 
correlation coefficient for a common factor from tests and the remaining 
common factors is equal to that from tests alone, since all of the common 
variance is in the test battery and adding the common variance to the battery 
will not change its predictive power. Guttman [3] and Dwyer [1] have shown 
that the multiple correlation coefficient for predieting a common factor from 
remaining factors and tests, for the orthogonal case, is 


1 Y Bi 
(14) R= jl- z = LU 
EET [ER 


A similar development for oblique factors yields 


(15) R, = : 


Computational Techniques 


To illustrate computational techniques for the application of the 
principles developed above, the Seven-variable, two-factor example used by 
Dwyer [1] is convenient, although the saving in computational effort becomes 
more convincing as the number of tests increases more rapidly than the 
number of factors. The correlation matrix is given in Table 1 with exact 
communalities in the diagonal cells. This matrix was factored by the multiple- 
group method, the summations being made over variables 1, 2, and 7 for 
factor I, and over variables 3 and 4 for factor II. The resulting factorial 
statistics are shown in Table 2. In usual applications where exact communali- 
ties are not known, it is necessary to use estimates [7]. 

In a practical situation it is necessary to judge the rank of R 
test this judgment by examination of the residuals. If r is underesti 
appreciable residuals will remain; if r is overestimated, some of the 
in computational labor will be lost. It is essential that residuals be ne 
before proceeding with computation of regression statistics. Othery 


and to 
mated, 
Saving . 
eligible 
vise the 


JOHN A. CREAGER 5 


TABLE l 


The Reduced Correlation Matrix 


Tes. 1 2 3 4 5 6 

1 450 580 -280 оо 360 360 pe 

580 760 -280 100 520 440 760 
3 -280 -280 700 560 цо -560 -420 
4 оо 100 560 610 400 -340 ~030 
5 £o 0ء‎ що 10) 50 оо 460 

380 LLO -59 -2340 050 520 540 
7 _ ma 190 E 030 460 540 830 


»десілаї points have been omittede 


latter will be approximated to а degree dependent upon the magnitude of 
residuals. The multiple correlation obtained under these conditions will 
generally be high by an amount approximately equal to the average of th 

absolute residual error [1]. К j 


Once the factorial reduction of R has been accomplished and the rth 


residuals checked for an indication of the completeness of extraction, the 
diagonal matrices V^ and Y are computed by taking the reciprocals of uF 
and U , respectively. Each row of the factor structure and pattern is then 
multiplied by v;; to obtain the uniqueness-augmented structure, B, and the 
uniqueness-augmented pattern, C. These are shown in Table 3. 

The next step is forming the matrix Q. This is done by summing unique- 


n =f 
ess-augmented, structure-pa ucts as follows: 


1+ У BuCa сш D Bnei > Bi Ci 
(16) = Dd Bain 
Q . 


ttern cross prod 


1+ В.С 


acluding the criterion variable, 

wish to include in the 
hown for the illustrative 
ity to the cross- 


È BaCi 
as summations are performed 
im cet whatever predictor У 
uà letion. The Q matrix for all seve 
€ n in Table 3. It is important to remember to add un 

s vx summations for the diagonal values of Q. 

B y 4 shows the methods outlined for predicting variable 1. Matrices 
E = C were obtained from Table 3 and matrix Q7 by inversion of the Q 
va à (involving all seven variables) in Table 3. The subsequent operations 
nis illustrated using variable 1 a$ the criterion variable. It is seen that, 
" cr usual practical situation, only single rows of the subsequent matrices 
eed to be computed. Hence, only the first TOW of each_of tbe subsequent 
ي‎ Reset?" 

пає Vesey: rat 
jc dn NES 000-5 


W 


across tests, 1 
ariables one may 
n variables is 8 


annm rm AOO‘ MM EA Nae EE Н mmm 


[cd "6960 II [52 Wo тї 
oznz- 198807 І 0266°2- сет I 
т 1. AL I 
Ec PNE БЫНЫ: Mes = 
Ej 
eeto ~~ 6t90*0 Sesoro goo 890"0- 162"0 — 
{т roa) Ty т161°0- 4 г001"0- groz*z aserte 988° 2 
tp у т 6189*0- Lrt9*0 *"£€9*0- 4208*0 тетет 1€80°% 9 
„о = 0= T= “Tit 
то =A o6téc*o = Tet wrt: || €699°T = P эго 986-0 8016°0 0908*0 sut LTE $ 
osgz*t 09560 6тог"т 1ес0"0 €09°T 95° т 
oog£*t T907"0- scum атє/*0- 8528°Т [5203 € 
é666*0-  géorro OTTO $90'0- 820Т°0 TeeC*o- —— €6os'r И 
2261*0- Lcd 85te*o- $69L*t TO’ soot’? [4 
{тозу үн 
Єт20°0- £669*0 $€€z*o- €706°0 чвтєет Z8T8*T T 
АЯ £9tíó*t Z886°T zéstz — 6191" тесе vU Б if, E 1f, [A m FOL 
TE Foy AA ELE ae DR 
P 
euo £960*0 60L0°0 6то'о — BEN0"O- отто 6040°0 e»pieraeig 203981 оц} Jo uopoequonSne-ecouonbyun 
[Caos € TEVI 
Tiét*o- 6189*0- — 81°0 O98z'l 008Е°Т А] соо п *т-б pur f ur sdeoxe розтшо ueoq олец squpod ттшрәоб» 
Lostez 11190 SE86°0 09560 того тётө'т £669*0 I OE GU GE GST TET 
2 9 = t oosz* обот т т962*- охот I 
] 
150° 0- 268т'0 2002*0- 8102" L = I IT T 
T $ 
©єєт*0- 46S0*0 "i£€g*o- 2208*0 9 
oszt*o L190*0 9ots*o 0808°0 $ оол oote 8920- 2698 D 8/06 L 
sezo отго'о вост tres0*0 4 0087 002% тал- grr Ls- 1985 9 
ureo octo*o- а g1£L*o- є 0097 0016 ij 0299 AONE 038s 6 
6¢z0"0 €LST°O £é*gt*o O09TO*O п 86€2*0- S69L*t z ое oot? теоз fete 906L Т] 4 
Ў 5 000€ ооо! BSL tzez- - 
9800*0- 9820°0 Logto*o £ot60*0 I $єєс*о- €706°0 т £908 800- € 
T т ir I тї I ootz 0092 ‘7160 2688 $тт- 6998 z 
iier. ` a 
pii La 0044 0081 asto 6999 zeLT- 90/9 т 
= 
sas чотвволЗәң 20у виоруеўпйшод олуш sani дүрр) HL no TOL 
4 TIL = 


#SOTISTIEYS 103094 OUL 


e TEL 


-q 


JOHN A. CREAGER 


matrices is shown in Table 4, row 1 of VT" was obtained by multiplying each 
tji by 1.3484 (i; from Table 3). The first row of R^! is then obtained multi- 
plying each element of the row of BQ^'C' by the corresponding element of 
the same row of V V^ and reversing the sign. The jth element (i.e., the diagonal 
element) of the row (in this case, the first cell entry) is then adjusted by 
adding v?, from Table 3. Thus the 1.6893 in the first cell of R tin Table 4 


E obtained by multiplying 0.0709 X (— 1.8182) = —0.1289 and adding 
.8182. 

The regression coefficients for predictit 
by multiplying each element in the row of R- 
diagonal element. of R™. 

The inverse of R may be checked by recalling that RR^ = I. In the 
present example the first row of R multiplied by the first row of R gives 
0.9997, and the second row of R multiplied by the first row of ЁТ! gives 
— 0.0004. It is, of course, the complete correlation matrix and its inverse 
that is involved here, rather than the reduced matrix shown in Table 1. 

The square of the multiple correlation of variable 1 in the other six 
variables is obtained by multiplying the first row of the 8 matrix by the 
first row of R, omitting R,, . This gives Rieu = 0.408182 and Ry.234507 = 
.6389. Use of formula (Ya) gives Riise = 0.408039 and R.4507 = .6388, 
the value obtained by Dwyer. Bis is 0.3881 X 0.59196 — 2297. 

'To obtain the partial correlation between variables 1 and 2 holding 
constant the remaining five variables, the diagonal element of the second 
row of R7" is required. The corresponding element of BQC’ is (0.1573) 
(1.8151) + (0.0239) X (0.1927) = 0.2901; 22, = 4.1665. The negative product 
of these is — 1.2087, and d» = 2.9578. The partial correlation coefficient is 
then obtained from the (1, 2) cell of R^. 


—1/A/d,-1/ Vd, = —03881 X —0.7694 X 0.5815 


By similar operations, applying (12) and (13), regression st: 
prediction of factor scores MAY be obtained. 


ig variable 1 are then obtained 
' by —1/d; , where d; is the 


= 0.1736. 


atistics for the 


Discussion 
om uniqueness-augmented factor 


ated in terms of determinants. 
e factor statistics. 
terms are readily 
method in terms 
ds may be used 
nalization of the 


The methods of regression analysis e 
statisties gi „Ме ге formu 
: 's given by Dwyer Ш ar : $ 
Generalization of Dwyer's method is possible for the obliqu 
Both Dwyer's method and the one presented herein matris 


adapted to machine methods of statistics. By having either 
1р extraction metho 


of oblic istics, multiple" 
jue factor statistics, MU iple-8 Pa 
н я 1 à "thogor 
to minimize residual computations without requiring orthog 
faetor matrices. 
These techniques are 


of each variable on then — 1 rem 


iti ir tain: (7) the regression 
5 Деп it 15 desired to ob | i 
m ariables; (ii) the partial regression 


aining V 


8 PSYCHOMETRIKA 


of each pair of variables, holding constant the remaining n — 2 variables; 
(iii) the regression weights for the prediction of test scores; (iv) the regression 
weights for the prediction of factor scores. They can also be used to set up 
standard procedures for routine treatment of batteries by machine methods 
of statistics. 


REFERENCES 
Ш Dwyer, P. S. The evaluation of multiple and partial correlation coefficients from the 
factorial matrix. Psychometrika, 1940, 5, 211-232. 


[2] Dwyer, P. S. The relative efficacy and economy of various test selection methods. 
PRS Report 957, AGO. 12 June 1952. 


[3] Guttman, L. Multiple rectilinear prediction and the resolution into components. 
Psychometrika, 1940, 5, 75-99. 


[4] Guttman, L. and Cohen, J. Multiple rectilinear prediction and the resolution into com- 
ponents: II. Psychometrika, 1943, 8, 169-183. 


[5] Horst. P. (Ed.) The prediction of personal adjustment. SSRC Bull., 1941, 48, pp. 
437ff. 


[6] Thorndike, R. L. Personnel selection. New York: Wiley, 1949. 

[7] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947, 
Manuscript received 7/16/56 

Revised manuscript received 5/23/57 


PSYCHOMETRIKA—VOL, 23, NO. 1 
MARCH, 1958 


ERROR OF MEASUREMENT AND THE SENSITIVITY OF 
A TEST OF SIGNIFICANCE 


J. P. SUTCLIFFE” 
UNIVERSITY OF SYDNEY 


f measurement for the sensitivity of 


Implieations of random error O 
е elaborated. By considering the 


the F test of differences between means ar 
mathematical models appropriate to design situations involving true and 
fallible measures, it is shown how measurement error decreases the sensitivity 
of a test of significance. A method of reducing such loss of sensitivity 18 
described and recommended for general practice. 

y of sampling, explicit attention is given to 
tions in the composition of samples 
A second form of error, largely 


This applies to the individual 


In the statistical theor 
— error, which refers to fluctua 
rawn at random from a defined universe. 


ignored in this context, is measurement error. 
sampling units and is thus related to the definition of the universe rather 


than sampling outcomes. Applications of sampling theory have proceeded 
on the implicit assumption that the sampling units which make up the 
defined universe are error free, that (in psychometric terms) the universe 
consists of true scores. This assumption is not justified in practice, where 
Measurement is seldom free from error. Parameters, such as the mean and 
the variance, of a universe of fallible scores will differ from those of a universe 
of true scores; tests of significance of a given effect will not necessarily be the 
same in the two cases. This paper elaborates the implications of measurement 
error for the simple case of the F test of difference between means. By setting 
up the mathematical models appropriate to the relevant design situations, 
it is shown how measurement error (relative to the parallel true score case) 
decreases the sensitivity of the test of significance. Sensitivity refers to the 
likelihood of detecting a nonzero population effect at a given level of signifi- 
cance. Through its inverse, proneness to Type II error, it is usually expressed 
quantitatively as power. A method of reducing such loss of sensitivity 18 


described. 


Definition of Universes of Scores 1 
n of a measuring instrument comprises 
nt. Let w represent any one unit or subrange 

of measurement 


f measurement. Errors 
form of this paper 


t 
edgement that the рте Mulhall of the De- 


I wish to express my thanks in acknowis the advice 0 


has ben 
efited from editorial comment, 5 
partment of Mathematics, University of SPY’ 
9 


The scale or range of applicatio 


an s : 

p nnt of units of measuremet 
he scale and v any one occasion 0 
x 


10 PSYCHOMETRIKA 


constant for all units of the scale on all occasions of testing will be designated 
f; errors constant for all occasions of measurement with a particular unit, 
but variable from unit to unit will be designated g, ; errors variable from 
occasion to occasion and from unit to unit will be designated Z,,. . For example, 
a carpenter's tape may be incorrectly calibrated uniformly over the whole 
scale; then unevenly stretched over the first few feet which are most commonly 
used; and finally subject to random error on any given application. For this 
case the total error of measurement // = f + д. + h,, . Analogous errors of 
measurement occur with psychological tests [3], but these will not be dis- 
cussed here; while knowledge of the source of error can facilitate its control, 
it is rather the mode of operation of error which is relevant to the statistical 
argument. 

Most generally, an obtained fallible measure or score, X, , ean be ex- 
pressed as the sum of the true score, T, , and its error of measurement, 
E, [3]. This holds whether measurement error is unitary, or complex in the 
sense illustrated above. The additive relationship also holds whatever other 
relationship may be shown to obtain between true score and error for a 
universe of obtained scores. For instance, while X? may enter as a multiplier 
in the relationship between obtained and true score, X, = WT, , X, may 
also be written X, = T, + E, , where E, = (E! — 1)T, . Other assumptions 
about the nature of error and its relationship to true score are tenable, but 
the additive assumption is adopted here because it simplifies the subsequent 
analysis. 


The mean and variance of an infinite universe of fallible scores X, = 
T, + E, may be obtained as follows: 


N N 
Mean = lim [X X,/N] = lim [25 (T. + E)/N] = T + f, 
Nm 


Noo 


Variance = lim D x?/N] = lim (t. + e)*/N] 


Noe 


= c + of + 2p,.010, . 


These outcomes are summarized in Table 1. Depending upon the mode of 
operation of error, cases may arise where any or all of Lo? and Pre are zero 

ER te , 
TABLE 1 


Paraseters of Universes of True, Error ani Obtained Scores 


Universe Mean Variance 


True scores T, т 9,2 

t 
Error scores E, E EC 
Obtained scores X, TEE e? 2 


J. P. SUTCLIFFE 11 


in which cases one or more of the parameters will be common to the universes 
of true and obtained scores. 

When error is absent, the mean = Т and variance = c; (Case 1). When 
error is constant Ё = f > 9, с? = 0, р. = 0; hence the mean of fallible 
scores = T + f, and variance = с? (Case 2). Where error is variable its 
random or nonrandom. (In either case, the 
e score values may be homogeneous 
ance permits nonzero correlation 


distribution may be either 
variances of error about different tru 
or heterogeneous. Heterogeneity of vari 
between true scores and the variance of errors about them, but, as in random 
sampling, this correlation is independent of pre- Heterogeneity of error variance 


should, of course, be taken into account in any analysis of variance [2].) 
2 > 0, and pre = 0; hence 


If errors occur at random about T,, then Ё = 0, ae 
mean = 7, and variance = с? + о; (Case 3). If errors are randomly dis- 
tributed about T, +f, E = f + 0, g3 > 0, pre = 0; hence mean = T +f, 
and variance = o? + e; (Case 4). Where errors are distributed randomly 
about T, + gu , then B=G+00>7 0, pre > 0, and hence mean = T + 9, 


and variance = o? + e; + Petite (Case 5). With nonrandom distribution 


would find Ё > 0, c? > 0, and p:e > 0. Whether 


qu s + f, or T, + ge › Mean = T + error, 
АП cases of nonrandom distribution 


of errors, generally one 
errors are distributed about 
and variance = e 4-0; P 201010 > 
of error are here referred to a5 Case б. 

2 to enable comparison of the 


The six cases are summarized in Table 2 


TASLE 2 


tors of Universes of True sal Fallible 4ге” 


2 т. 
> «ее 
3 T 
" abi 
4 ped 
- "XE T 
5 Te 9i ч “= 
2,0,2 + Prem 


jiverses with those of the true score universe. 
the same a5 those in Case 1; however, Case 2 
3 the same mean. Cases 1 and 2 аге unlikely 
ts aim to achieve the conditions of Case 

s, and other nonrandom 


8, scale biase 3 | r 1 
1. The following discussion will 


ent on the others. 


parameters of fallible score W 
In no ease are both parametere 
has the same variance, and Case 
to occur in practice. Most experimen 


3, but the intrusion of constant error 
ite commo! 


12 PSYCHOMETRIKA 


Comparison of the Design. Models 


With the universes of true and fallible scores defined, it becomes possible 
to compare the sensitivity of tests of significance applied in given cases. For 
comparative purposes the analysis of variance for Case | will be described. 
Then two analyses for Case 3 will be considered—the first reflecting common 
practice, the second involving random replication of measurement to increase 
reliability and hence sensitivity. 


Notation and plan for Case 1 


Consider the comparison of means of independent random samples of 
true scores obtained at different levels of a single-treatment classification. 
Let i = 1, 2, --- , a represent any one of the treatment levels within the 
treatment classification A. Let j = 1, 2, --- , b represent any one subject 
in a sample of subjects B. Then X;; is the true score of the subject j in the 
treatment level or group 2. As subjects are randomly sampled, j represents 
number only, not rank within a group. Let a dot in place of a subscript 
represent summation across the class indicated by the subscript replaced, e.g., 

b 


a b 
eee ao Xo» ke =X, « 


iei ied jet 
The sample values of X;; and the sums are represented in Table 3, 


TABLE 3 


Plan of Obtained Scores of Subjects Within Random 
Samples Allocated to I 


endent Treatment Groups 


à A B Subjects 
Treatsents 1 2 Я , b Sus 
t X Ma - j My Xy 
2 Xu © - Xy ‚ х X 
i би Жа ij Xa X, 
al Хар ш Xaj Xas Xa 


Analysis of variance for Case 1 


The total variance of the ab sample values of X;; сап be expressed in 
terms of two sources of variation: between treatments, A, and between sub- 
jects within treatment levels, B, . A given deviation score may be w 


p T = (Xi; چ‎ XE.) = (X. = x.) + (Xe = Xy. 


ritten as 


J. P. SUTCLIFFE 13 


The total sum of squares is 
a b 

SS; = 2; (Xi; — ХӘ) = b Y —XJ 2: 253 а= Ж. 
i=] jel i j=l 

The degrees of freedom pertaining to these components are Total = (ab — A 

A = (a — 1), B, = a(b — 1). From the SS and df, the mean squares, S, 

may be obtained as unbiased estimates (on the null hypothesis) of a common 

population variance. 


Expectation of mean squares for Case 1 


ated by a given 5°, one takes the expectation 


To determine what is estim 
ase 1 involves a universe of true scores, 


according to the model involved. As С 
Model 1 ean be written as 


Xi; = А; + Bis . 


A, is the class of treatment parameters of which the sampled treatment 
means are estimators. The distribution of A; will vary according as treat- 


ments are fixed constants or randomly sampled. For convenience the case 
of random A, with variance c4 will be considered. B;; is the class of true score 
deviations from A; , which are normally distributed with zero mean and 
variance c; . To find the expected values of SS and then 5°, one substitutes 
model values in the analysis of sample variance and thereby determines the 
limiting value of a given component. 


(2) Expectation of SÈ 
UL. р е ДТВ 2 
pb » (X;. = xy - 20 x (4, — A)? +b à x ay} 


i=l 


= bla = Dei + bla — 1)c;/b. 


Thus S2 = b $e ii ў уа = po VIS 
iel 
(ii) Expectation of Sirs 
(X5 ag be» = (Bu < Bi), 
and 


X ab = 02 7 


a D # 
— 2 > ao 
Ba = Q ii 


14 PSYCHOMETRIKA 


TABLE 4 


Analysis of Variance for Model 1: 
Single Treatzent Classification Design with 


b Randomly Sampled Subjects for Each of а Levels (True Scores) 
T 


Nusber Source Sus of Squares af s? Expectation of 5? 


xa 
E 


(2-1) Sa boa? + n? 
ab 2 
2 3 within A т в(ь-1) Sp, 22 
ab E 
3 Total EEX, X? (аь-1) 


These outcomes for the analysis of variance are summarized in Table 4. 
On the null hypothesis сї = 0. One rejects the null hypothesis if the ratio 
Fı = 81/85, with df, = (a — 1) and df, = a(b — 1) exceeds F, , the tabled 
value for the chosen level of significance. 3 


Case 8 


It is common practice in psychological experiment; 
superficially similar to the one just described. That is, one has a series of 
random samples of subjects allocated to treatment levels and for each subject 
one has a single score. If, as is usually the case, the scores are fallible, then 
Model 1 is inapplicable and instead one must write the model to include 
error of measurement. Assuming that the scores have been drawn from a Case 
3 universe, there will be two designs according as one has or has not random 
replication of measurement on a given subject. For common practice, which 
provides no measurement replication, Model 3a is 

Xii == A; T Bj + Ti; * 
A; and B;; have been defined above; T;; is the r 
component, normally distributed with zero mean and variance о? , The 
summary of the analysis of variance for Model 3a is given in Table 5 For 
the test of significance, the null hypothesis is сд = 0. One rejects the null 
hypothesis if the ratio Faa = 87/87, with df, = (a — 1) and df, = a(b — 1) 
exceeds the tabled value of F for the chosen level of signifieance, 

One may note that the terms c; and о? are common to the expectations 
of Si for Models 1 and За. In addition, the df, and df, are the same for F, 
and Рз, . This enables comparison of the sensitivity of the two tests. The 
power of the F, test is Prob {F, > F.0?/(bo? + 7;)]; and the power of F, 
is Prob [Fs > Falo, + o2)/(bei + of + о) }. The М 


smaller the value to the 
right of >, the greater the power of the test. As o?/(bg? + 01) < (ei + o3)/ 


ation to use a design 


andom error of measurement 


J. P. SUTCLIFFE 15 


TABLE 5 


Analysis of Variance for Model 3a: 
Single Treatment Classification Design with 


b Randosly Saspled Subjects for Each of a Levels (Fallible Scofes) 


n 


Nusber Source Sus of Squares df s? Expectation of S? 
m rs = 2 
1 А BE, ex a (2-1) EM bop? * op? + с? 
ab . 2 
2 B within A BE (у Л? a(b-1) S3, oy? + 0,2 
ab - 
EB Kp = Xo” tab- 1) 


3 Total 


(boi + о? + ох), the power of F, is greater than the power of Рз, . That is, 
analysis in accordance with Model 3a provides a less sensitive test of the 
hypothesis c; > 0 than does Model 1; the loss of sensitivity is due to the 
intrusion of random error of measurement. 

Model 3a allows for the acknowledgement. of the presence of error 
variance, but there is no provision for its isolation. To achieve this, one has 
to add random replication of measurement for each subject. That is, instead 
of a single score for each subject one has a number of scores. This introduces 
à source of variation in addition to those already accounted for; accordingly 
the notation and plan presented above have to be expanded. Let k = 1,2, 

, € represent any one measure or score in а sample of scores C. Then 
Xr, is the kth score of subject j at treatment level 2. As measures m eonim 
аге randomly sampled, Ё represents number only, not rank. Now Mode 


3b may be written as 


X A; + Bi + Tire 


5 " " at 
A; and В,, have been defined above; and Ге 18 defined v^ as iy ee 
is, Model 3a is the special case of Model 3b in which Е = 1. The summ? 


аре aT is analysis 
the analysis of variance for Model 3b is given in Table 6. This апау 
Provides two tests of significance. — , , One rejects the null hypothesis 
.. . For the first, the null hypothesis ise; = 0. ер" С ab(c — n exceeds 
ү the ratio Fan = S5 / Sos with df, = a(b = ves eos if the null hypo- 
the tabled value of for the chosen level ا‎ ith the homogeneity of 
thesis is not rejected, the outcome 1 consiste o reliability of measure- 


xperi А Е e has zer dicic HR 
experimental subjects, and in that p estimate of the reliability of 
rejected, ? the population 


Ment. If the null hypothesis iS iver 

5 тегзе, 
Measurement may iu ОБА, With the — E) which may be 
value of the reliability coefficient [1] i.e = Ө ҮҮ 


estimated by 


16 PSYCHOMETRIKA 


TABLE 6 


Analysis of Variance for Model 3b 
Single Treatsent Classification Design with 
€ Randon Measures on each of 
b Randomly Sampled Subjects for each of 
g Levels (Fallible Scores) 


Nunber Source Sus of Squares df s? Expectation of 52 

€ = 3 2 2 2 2 

2 A be £ n. - хо) (а-1) Sa Ъсод o cay" v d 
ab. - 

2 2 2 2 

2 B vithin А cre (Xij. - x) alb- 1) 5з, серва 
abc 5 

2 „2 2 

з C within B EEE My Xj) ab(c-1) Scy [A 
abe " 

4 Total PEL > xy? labe- 1) 


————————À LLL 
Ts = (Sha — 80) / 185 — Se, — o]. 


For the second, the null hypothesis is c? = 0. One rejects the null 
hypothesis if the ratio F4, = S2/82, with df, = (a — 1) and df, = a(b — 1) 
exceeds the tabled value of F for the chosen level of significance. 

Comparison of the power of the F4, test 


Prob (Fi, > Feleo: + оз) (сої + со? + o) 
with the powers of F, and F,, shows that as 


EN; < co; + o? < gi + о? 
boi +o: ^ beoi + Coi + o; © be torte: 


then power F, > power Fi, > power Ёз. 


While analysis by the Model 3b allows for isolation of an estimate 
of c? , it is important to note that one may not convert F4, to F, by sub- 
tracting SZ, — c; from the numerator and denominator of F. 
appropriate adjustments for the weights b and c. F is the ratio of two in- 
dependent x^ variates—the independence is negated by such a procedure. 
The only way to achieve the standard of sensitivity of the F, test with the 
given number of subjects is to use error-free measurement. Ав this is an ideal 
towards which one ean do no more than strive, one has to be satisfied with a 
less sensitive test. Of the two remaining experimental designs, assuming 
that one can achieve measurement replication, that which provides the 3b 
form of analysis is to be recommended for general practice. It yields estimates 
of measurement error variance and reliability, for the latter a test of sig- 
nificance, as well as providing a more sensitive test of treatment effects than 


зь and making 


J. P. SUTCLIFFE 17 


the 3a design using the same number of subjects. These contentions 
apply with equal force to the design situations where the ¢ test is ordinarily 
applied. Finally, while the argument has been in terms of the single treat- 


ment classification design, it may be generalized to multiple classification 


designs. 
REFERENCES 
[1] Alexander, H. W. The estimation of reliability when several trials are available. Psycho- 


metrika, 1947, 12, 79-99. 
[2] Ehrenberg, A. S. C. The unbiased estimation of heterogeneous err 


metrika, 1950, 37, 347-357. . 2 
[3] Walker, Helen M. and Lev, J. Statistical inference. New York: Holt, 1953. 


Manuscript received 1/14/57 
Revised manuscript received 4/30/57 


or variances. Bio- 


PSYCHOMETRIKA—VOL, 23, NO. 1 
MARCH, 1958 


DETERMINATION OF PARAMETERS OF A FUNCTIONAL 
RELATION BY FACTOR ANALYSIS* 


LEDYARD R TUCKER 
PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


n to determination of parameters of a functional 
relation between two variables by the means of factor analysis techniques. 
If the function can be separated into a sum of products of functions of the 
individual parameters and corresponding functions of the independent 

unctions of the parameters an of the 


variable, particular values of the 1 
functions of the independent variables might be found by factor analysis. 


Otherwise approximate solutions may be determined. These solutions may 
represent important results from experimental investigations. 


Consideration is give 


‘actor analysis techniques to determine parameters 
ations has been à topie for occasional informal 
discussion. If a factorial approach could be developed it would have con- 
siderable application to experimental problems such as learning curves, 
work decrement curves, dark adaptation curves, etc. This note gives a 
theoretical basis for determination of parameters by factor analysis for 
many nonlinear functions. ` 


s. 
Factor analytic methods have been lim 
linear functions of the form 


The possible use of f. 
of nonlinear functional rel 


ited to investigations applying 
(see [2], equation 3, р. TX 


‘ 
Sj; = Dy UinSni » 


(D mel 
ations, and im and Sm: are to be estimated. The 


where the ву, are the observ 25 
ndividual parameters. 


parei 

Qim are task parameters and the Smi 116 1 1 . 
| In ds 64 context we will consider the functional relation between 
two variables x and y Variable 2 might be termed the independent variable 


and y might be termed the dependent variable. À general statement of this 


functional relation for any given individua 
(2) yi = (Doi ,‚2), 


for which there 


liis given by 


are a number of parameters Pe which have specific values 


: ГЕЯ i rt by Princeton University, the Office of 
Sand Im foo ge ty one upped 0.20, and the National Science Foundation 


under grant NSF G-642. 
19 


20 PSYCHOMETRIKA 


P»: for each individual. Such a relation is shown graphically in Fig. 1. There 
exists a family of functions of the form of any given ¢ with the values of p,; 
defining the particular member of the family. Let j be a particular point of 
this function with coordinates x; and y;; . Then 


(3) yis = dp, , х1). 


Many functions may be transformed so as to produce 


4 эн = È tales) Faln)- 


The f,,(z;) are a number of functions of the independent variable =; . The 
Р.„(р,‹) are corresponding functions of the parameters p,; . The number, r, 
of such functions may be finite, or it may be infinite. In this latter сазе, (4) 
represents an infinite series, such as Maclaurin's or Taylor's power series or 
Fourier's trigonometric series (see a standard advanced calculus text, e.g., 


FIGURE 1 
A Functional Relation of the Form of (2) 


[1], [3]). Frequently, in this case, a small number of terms of the series will 
yield an adequate approximation to the y;; . In order to make (1) applicable 
it is only necessary to define 


(5) Qin = falei) 

(6) Smi = Fapa): 
Then 

(7) Vii = È Qjm8mi + 


In the present context the s,; will be considered as derived parameters 
of the transformed function. While they may be expressible in terms of more 
primitive parameters, they do have the property of determining the particular 


LEDYARD R TUCKER 21 


funetion for each individual. The family of functions is defined by the аы. 
As a consequence of (7), observations of y;; for several given 2; and individuals 
i may be entered into a score matrix. Each x; might be used to produce one 
statistical variable. Estimates of the а;„ and Sm: then can be obtained by 
factor analysis techniques. 

In order to illustrate the foregoing, consider a learning task for which 
the learning curve is a simple exponential funetion, such as 
(8) Ун == ӨР 
where y;; is the performance of individual i on trial j, b; is a parameter for 
individual û, and №; is the number of trials j. t; replaces 2; as the independent 
variable in this context, and b; replaces the parameters Poi - Equation (8) 


may be transformed to 


(9) e gs 690. 
Then 

(10) an = fü) =e" 
(11) su = Ё.) = 6. 


In this ease only опе term of the sum of produets indicated in (4) and (7) 


exists, From (9), (10), and (11) 
ун = Adis ° 
ade of the performances on the 


at each of a selected number 
. A factor analysis will 


(12) 


For this simple case; observations phe e 
learning task for each of a number of indivi ~ ; | 
of trials. These observations yield a matrix of yis 


: 1 4 and 8; - 
involve: a ‘si ‚ and yield estimates of the aj; aNG 81; 1 
inv jm э retis problems of communalities and rotation of axes 


i : t context it seems appropriate to assume 
remain to be discussed. In the presen I ў 

that each observed y;; MAY be in error, but the assumption = Gare gd 

n 5 ji ur 
seems i onriate. AS а consequence, reliability estimates should be place 
tute 3 cl E» с c^ mum lations. The rotation of axes problem 
he diagonals o 
" present case. 


trix of intercorre Ther г 
remains unsolved in the 'The jer еза 
ахеѕ тау Бе rotated. It is doubtful, moreover, t ч и s a inis 
structure is applicable the factor loadings os ih о 
the fonnitoas f, (x;) for the selected points. Some other principle, 

eh ion of the axes. 

ds to the obverse factor 


ion of measures. A 
procedures, where people аг and the у ате observed for a group 


large number of values of x; are 57. duals can be considered as а variable and 
of individuals. Each of these individu? f individuals. The Sm; are 
e 


р for pairs 0 
correlations of the У; бап obtained for P 


22 PSYCHOMETRIKA 


now the factor loadings, and the a;,, are the factor scores. The communalities 
and rotation of axes aspects of the analysis are quite similar to the corre- 
sponding aspects of the first procedure already discussed. One important 
difference between the present analysis by persons and the previous alterna- 
tive stems from the more direct determination of the sm; . An inspection of 
the matrix of Sm; might reveal a curvilinear relation between the s,, for 
several m. Any such relation as the entries in one row being proportional to 
the square of the entries in another row would indicate a relation to a common, 
more primitive parameter. The entries in one row being proportional to the 
product of corresponding entries in two other rows would also be indicative 
of more primitive parameters. Rotation of axes might be performed so as to 
reveal such relations. 

In any particular situation, the choice as to which variable is to be the 
independent variable x and which variable is to be the dependent variable y 
may be quite important. In a learning experiment for a list of paired associates, 
each trial might be an x; , and the proportion of correct responses be the 
observed y;; . However, selected proportions of correct responses might be 
taken as the z; , and the numbers of trials necessary to reach these proportions 
taken as the y;; . Consider a slightly more complex exponential learning 
curve than that given in (8), such that 
(13) P geen, 
where P is the measure of performance. 'The parameter c; has been included 
as a multiplier to ¢. This function does not separate in the manner that (8) 
did unless an infinite series is used. In which case, if values of t; are chosen 
and values of P;; are observed, the factor analysis will not involve a definite 
number of factors. Each successive factor will permit a closer approximation 
of the series to the function. Some finite number of factors might be found to 
be adequate. 

1f logarithms are taken of both sides of (13), it is possible to solve for t 
as a function of P: 


1 b; 
(14) pei mL 


When values of P are selected as P; and the corresponding { 


ii аге observed, 
then 


1 b; 
(15) ін = с, EP. Lay 
Define 
(16) an = logP; , 
(17) Su = l/c; , 
(18) dj = 1, 
(19) 8; = b/c; . 


LEDYARD R TUCKER 23 


Теп 
(20) {н = ал: F dis s 


which is in the form of (7). Only two factors are involved. 
Another extension from (8) is to introduce an additive constant d;: 


(21) оек; е a, 


Individual parameters and the variable ¿ may be separated for (21) in the 
same manner as given for (8). There are now two factors. 

If both of the foregoing extensions of (8) are incorporated into a single 
extension, then 


(22) Pod +e 


(eit thi) 
The individual parameters do not readily separate now from either variable 
without employing an infinite series. 

It is to be noted that (8) might be treated in the same manner as was 
(13). The individual parameters might be separated from the variable y or P 
rather than from { as given. Thus, the foregoing examples include (7) a func- 
tion, equation (8), that may be treated either way; (i) two functions, (13) 
and (21), each of which may be treated in only one manner; and (їйї) a func- 
tion, (22), that cannot be separated. The two single treatment functions form 
a contrast as to which variable, P or t, is taken as the independent variable. 
In (13), P should be taken as the independent variable while in (21) t should 
be taken as the independent variable. In any particular experimental case, 
the decision as to which variable is to be treated as the independent variable 
must rest on experience and the judgment of the experimenter. There are 
cases where the number of factors is excessive whichever variable is taken as 
the independent variable. The factorial approach may yield in Sw of ie 
cases an adequate approximation to the observations with a limited number o 


factors. 
REFERENCES 
New York: Macmillan, 1925. 
ysis. Chicago: Univ. Chicago Pr 
Ginn, 1911. 


[1] Osgood, W. F. Advanced calculus. ess, 1947. 


[2] Thurstone, L. L. Multiple-factor anal, А 
[3] Wilson, E. B. Advanced calculus. Boston: 


Manuscript received 8/15/59 


PSYCHOMETRIKA—VOL. 23, ХО. 1 
MARCH, 1958 


THE INCLUSION OF RESPONSE TIMES WITHIN A STOCHASTIC 
DESCRIPTION OF THE LEARNING BEHAVIOR OF 
INDIVIDUAL SUBJECTS 


R. J. AUDLEY 


UNIVERSITY COLLEGE, LONDON 


ess applicable to the learning behavior of an individual 
jibes both the response times and the 
situation involving two alternatives. 
ior assessing goodness of fit are con- 


A stochastic proci 
subject is discussed. "The process descr 
sequence of choices obtained from а 
Parameter estimates and techniques 


sidered. 


[2], the possibility of providing a probabilistic 
behavior of an individual subject was discussed. 
s suitable for this purpose was introduced, 
and goodness of fit were examined. 
escription of the sequence of responses 


In a previous paper 
description of the learning 
A family of stochastic processe 
and problems of parameter estimation 


This examination was restricted to the d Про я Р 
made by a subject in an experimental situation involving a choice between two 


alternatives, e.g., the learning of a position habit in a single-unit T-maze. 
Usually, however, an investigator observes not only the choice made at each 
trial but also the time taken to make the choice, which for brevity will be 
referred to here as the response time. The present paper 18 an attempt to 
include the response times within the stochastic description elaborated in 


the earlier paper. - 
This i i se ti ries 
This inclusion of response times car 

estimation of parameter values can now be based upon a continuous time 


variable as well as the two-valued variable, success or failure, which was the 


only datum previously employed. Furthermore, there are res anere 

; i 1 s or res, whi 
of responses, such as a long unbroken series of WP or n UN P 
make it impossible to provide parameter estimates unless respo 


can be used for this purpose 
The Stochastic Processes 
ased on an шт scheme. Here, however, 
5 i ple assumptions, which can be regarded 
ey will be developed from ae 


urn scheme. То ч 
às an identification of the " | E e papas Consider an urn containing 
capitulation of the scheme 0 the 

a red b 


nsidered equivalent to the 
red and black balls, drawing 
and à 


o an incorrect response. 
Occurrence of a correct Те5РӨ is change allis drawn, accord- 
The number of balls of the two colors is chang 


vith it several advantages. The 


Originally, the pro 


26 PSYCHOMETRIKA 


ing to certain rules. In the present paper, the number of balls of a particular 
color is identified with a hypothetical mean rate of making the response 
associated with this color. 

For the purpose of simple exposition, attention again will be restricted 
to data obtained from learning situations involving only two alternative 
responses, with one response consistently rewarded. At the (th trial, it is 
assumed that the probability of a correct response occurring in a small time 
interval (T, T + AT) is т,АТ, and of an incorrect response in the same 
time interval is w,AT. r, and w, may be regarded as hypothetical mean 
rates of responding, i.e., the distribution of response times for either response, 
taken individually, is exponential. This assumption was considered for situ- 
ations with only one available response by Mueller [10]. Christie [6] has also 
considered the two-choice situation as one involving the competition between 
two responses emitted at independent random rates. His paper should be 
consulted for a more detailed statement of the events supposed to take place 
at any particular experimental trial. 

The probability of no response occurring in time T will be 


Q) РШ) = eterno 


(e.g., see Feller [7], p. 366). 

In the learning situation being considered, the first response to occur 
terminates an experimental trial. Hence, the probability of а correct response 
occurring at any trial is the probability that this response is the first to occur. 
The probability that a correct response terminates the (th trial at time T 
is from (1) and the basic assumptions equal to 


er ‘OT AT, 
and therefore the probability of a success at the th trial, is 
P(t) = Í g rete, IT = T, : 
(2) ( ) 0 ue T, FU 


It is further assumed that the hypothetical response rates, т, and ш, ‚ аге 
linear functions of the number of correct and incorrect responses in the first, 
t — 1 trials. Thus it is assumed 


(3) r =n + ka + (t — 1 — k,)b, 
wi + ke + (t — 1 — kid, 


w, 


where r, and w, are the initial rates of making correct and incorrect responses, 
respectively, b, is the number of correct rewarded responses in the first 
@ — 1) trials, and a, b, c, and d are parameters associated with the influence 
of punishment and reward upon the hypothetical response rates, 


ы 


R. J. AUDLEY 27 


Substituting for r, and w, in (2), the probability of a correct response on 
the tth trial, given k, previous successes, is 
"e n + ka = 0 + (t= Db 
rı +w + kıla + e btd) + ( — Dd + 0) 


Dividing numerator and denominator by (rı + wi), and putting 


Ti á a b 
i+w” nrw C rem. h 
ate _ b+d _ 
naa) тк, d т Ба Yz 
gives 
5 Mets o +R = 8B +t- DB. 
® PUE) = TF Ev — v9 + € — Dv: 


amental expression of the earlier paper [2]. 
onse times at the (th trial is also completely 
ar, the mean response time, L, , is 


Equation (5) is the fund 
The distribution of resp 
specified and is exponential. In particul 


given by 


(6) f. Í eero + w)T dT = PEA 
mes and probabilities here is based upon 
y, the assumptions concerning the 
rates and the relation between these rates and past 
ly modified. Also, in practice, it is unlikely that the 
parameters, 7y , Wi » a, b, c, and d, would be used. 
and d eliminated, or given 


ameters а, b, с, 
mmonly employed. Ап application of 


as been given elsewhere [1]. 


The relation between response ti 
the very simplest of assumptions. Clearl 
hypothetical response 
experience can be readi 
general process, having six 
Special eases, with some of the par: 
particular values, would be more со 


such a special case to experimental data h 
The relation between these stochastic processes and those suggested by 


other investigators, in particular by Bush and Mosteller [4, 5] and Gulliksen 
[8, 9], has been fully discussed in the previous paper [2]. However, one further 
comparison is suggested by the present inquiry. Assumption (3), giving the 


hypothetical response rates as linear funetions of the previous number of 
| n to be equivalent to a system of linear operators 


correct responses, can be show equiv t 1 
and is similar to the treatment of a situation with only one available response 
given by Bush and Mosteller [3]. Thus expression (5) can be included within 
a linear operator system if the operators are assumed to act not upon the 
A: = pothetically underlying 


probability of a response but upon response rates һу! 


this probability. 


28 PSYCHOMETRIKA 


Estimation of Parameters 


For brevity of exposition, consider the estimation of parameters for the 
special case arising when b = c = 0 in (3), or equivalently о = y,,8 = 0 
in (5). Thus, it is assumed that the effects of reward and punishment of a 
response are confined to the response rate associated with this particular 
response and do not generalize to the other. This is the stochastic equivalent 
of the equation of the learning curve developed by Gulliksen [8, 9]. 

Consider the data obtained from a simple situation involving a choice 
between two alternatives. We observe the sequence of choices made by a 
subject as well as the response time for each trial. The response occurring 
on the /th trial can be symbolized by a characteristic random variable, X, 
If a correct response occurs, X, = 1; if an incorrect response occurs, X, = 0. 
Similarly let Т, be the response time at the (th trial. It should be borne in 
mind, however, that the distribution of possible response times at each trial 
is taken to be exponential and hence response times close to zero are con- 
sidered likely. Therefore T, should more properly be the difference between 


the response time observed and the minimum response time found in the 
experimental situation. 


Suppose, then, that we have the results of n le 
subject, X, and T, (t = 1,2, ... 
correct response at time T', is 


arning trials of an individual 
‚ п). At the tth trial, the probability of a 


Фар. дур. 
and the probability of an incorrect response at the вате time is 


- T 
e (retwe) 'w AT. 


Hence the likelihood, L, , of the entire sequence of responses 


and response 
times is 


(7) Т, = ТЇ [prr n Ж aso, 


1=1 
We now seek those values of the parameters r, , w, , a, and d which 
maximize L, . It is more convenient to maximize 
(8) м = log L, = >, = + wT, TX, log T, + (1 == X) log w,]. 


t=1 


Remembering that b = c 


= 0 is assumed, substitute for r, and w, from (3), 
so that 


о “= È [—(@ + wi + Ia + f,d)T, 
+ X. log (ri + ka) + (1 — X) log (w, + 1.0], 


- 


| 
| 


R. J. AUDLEY 29 


where f, = £ — 1 — k, . Differentiating А, with respect to r, , wı , a, and d 
and setting the differentials equal to zero, | 
Ds + У =: 

(1) Be. Xd -EI 0, 

(12) шыр» Жз се =0 

(13) Ф 2 Ут, + x e. 


7, and w, can readily be eliminated. For example, consider (10) and (11). 


Equation (11) may be rewritten as 
i u-iyx -—5g-1i( oe Ж | 
(4) XT. -tgah п + ka) a LE n Eni i 
= 0 on the occasion of the first correct response, 


It should be noted that К, = 
and hence the summation in (11) extends over one less trial than that in 


(10). Thus by appropriate substitution from (10), 


УАТ, ا‎ | -i(-n 257; 


t responses in the entire n learning 


where k is the total number of correc 


trials. Hence 
= ЫТ) 
(15) = к-@ МТ. 


Similarly, it may be shown that 
(16 n-k- (d 22 HTD, 

) ш = ОТ, 
Substituting these values for 7i and w, in (11) and (13) two equations are 
obtained, each in one unknown, namely 


T BREMEN Url = el, = 05 
е Ej XXE ыты ры к= 


[(k — a 
1.29 ) -EiT = 0. 
rO = X [E E IRS TAL Té x 
dable, but they are not difficult to set up 
1 erative procedure. Generally 


umerical it 
n employed (e.g, see Whittaker and 


(17) 


These equations may appear formi 
and can readily be solved by а 1 
a Taylor series expansion has bee 


30 PSYCHOMETRIKA 


Robinson [11]). Having found the appropriate estimates of a and d, (15) 
and (16) give estimates of r, and w, . If the alternative parameters for the 
description of the choice sequence alone, p, o, and y, , are required, the 
appropriate substitutions are given by (4) and (5). 


Goodness of Fit 


The stochastic processes described above are попав, and hence 
no definitive answer to the problems of testing goodness of fit can be given. 
There are two kinds of data with Which the theoretical description ean be 
compared; each comparison presents rather different problems. 

Some consideration of the sequence of choices made by a subject has 
already been given [2]. It was suggested there that the most appropriate 
procedure would be to determine the distribution of likelihoods of all the 
possible sequences of length n, given the estimated parameters, and then to 
compare the likelihood of the observed Sequence with this distribution. 
Unfortunately, as yet, we have been able to determine this distribution 
only for the simplest case arising from (5), when о = B,yi = y. = 0. Lacking 
any proper statistical procedure, it would 


üpparently be best to compare 
visually the observed curve of cumulative successes against trial number 


with a theoretical curve based upon the computed conditional probabilities 
of success at each trial. Although this is not very satisfactory, it should give 
some indication of any gross discrepancies between the theoretical descrip- 
tion and the experimental data. 

In the case of the response times, some idea of the goodness of fit of the 
stochastic process can be given in the following way. (I am indebted to Dr. 
D. E. Barton of the Statistics Department, University College, London, 
for this suggestion.) Having estimated the parameters, the theoretical mean 
response time at each trial, L, , is given by (6). Since the response times 
are assumed to be distributed exponentially at each trial, the ratio of the 
observed response times, 7, , to the theoretical mean time Ey, ‚ (ie, В, 
T./L;) should be itself distributed exponentially. Hence exp (— R,) should 
have a rectangular distribution in the region (0, 1). Thus the over-all theo- 
retical distribution of response times can be tested against the-observed 
data. Further, a plot of the transformations R, against the trial number # 
should reveal any marked trends away from the stochastic description, 


Conclusion 


It is apparent that answers to the problems of goodness of fit are not 
very satisfactory. In spite of this, it is Suggested that the general approach 
presented here has some value for the description of experimental data. The 
procedures given should be sufficient for the comparison of learning behavior 
occurring under different experimental conditions. Furthermore, the kinds of 
assumptions underlying the stochastic description make it possible to intro- 


Ыал авалам 


R. J. AUDLEY 31 


duce assumptions concerning the influence of other variables upon learning 
behavior. In particular, a consideration of the relation between the hypo- 
thetieal response rates and conditions of motivation might be of some interest. 


The basic assumptions may also be modified easily, without changing 


the general form of the mathematical development. Other theoretical de- 
scriptions of learning behavior, therefore, might be readily put into the form 


suggested by th: present paper so th 


at their formulation and verification 


could be carried out with greater precision. 


2 


[10 


[11] Whittaker, E. T. and Robinson, G. 


1] Audley, R. J. A stochastic description of the learning behaviour of 


REFERENCES 

an individual 
subject. Quart. J. exp. Psychol., 1957, 9, 12-20. 

Audley, R. J. and Jonckheere, A. R. Stochastic processes for learning. Brit. J. statist. 
Psychol., 1956, 9, 87-94. 
Bush, R. R. and Mosteller, 
Rev., 1951, 58, 313-323. 
Bush, R. R. and Mosteller, F. 


math. Statist., 1953, 24, 559-585. А " е -— 
Bush, R. R. ке Mosteller, F. Stochastic models for learning. New York: Wiley, 1955. 


Christie, L. S. The measurement of discriminative behavior. Psychol. Rev., 1952, 59, 


443-452. " . 
Feller, W. An introduction to probability theory and i 


F. A mathematical model for simple learning. Psychol. 


A stochastic model with applications to learning. Ann. 


ts applications. New York: Wiley, 


1950. . | 
Gulliksen, H. A rational equation of the learning curve based on Thorndike's laws 
sen, Н. A га 


of effect, J, gen. Psychol., 1934, 11, 305-434. : . € 
FE eri generalization of Thurstone's learning function. Psychometrika, 1953, 
ksen, H. айла 


16, 207-307. А ч " 
Mueller, C. G. Theoretical relationships among some 


nat. Acad. Sci., 1950, 36, 123-130. 


measures of conditioning. Proc. 


The calculus of observations. London: Blackie, 1924. 


Manuscript received 11/23/56 | 
n manuscript received 4/22/5? 


PSYCHOMETRIKA—VOL, 23, NO. 1 
мавси, 1958 


DETERMINING THE DEGREE OF INCONSISTENCY IN A SET OF 
PAIRED COMPARISONS* 


Harotp B. GERARD 
BELL TELEPHONE LABORATORIES 


AND 


Hanorp N. SHAPIRO 
NEW YORK UNIVERSITY 


Consistency in paired comparison data is defined. Two types of in- 
consistency which may arise are defined. Computational formulas for these 
Lypes of inconsistency are derived, and examples illustrating the use of these 
formulas are presented. 


the authors were concerned with obtaining 


In a recent experiment [1], 
certainty concerning the probable success 


a measure of S's psychological 
of some future undertaking. After exposure to the experimental manipu- 


lations, Æ presented S with seven 5 X S index cards with a different odds 
for success printed on each card. The stimuli presented were: 10 to 1, 5 to 
1 to 10. All possible pairs of stimuli were pre- 


1,2 to 1, 1 to 1, 1 to 2, 1 to 5, ; 
sented, and 8 was asked to select the member of each pair which better 
reflected what he thought his chances were. . 

f both subjective probability of success 


From this set of data a measure 0 : ) { 
and S's degree of certainty regarding his estimate was desired. This problem 
lus comparisons are made. What is presented 


consistency of response in such 
trix arithmetio, 18 quite difficult 
is a complete analysis of a 


is typical of many in which stimu 

in this paper is a method for analyzing the 
experiments. The method, which involves ma 
to formulate in all generality; 


Special case. 


presented here 


The Approach 
ints Ру, Ps,» P, on a line which 


ability. Let X represent the position 
jective probability: 


pear as ро 
ve prob 
his actual sub 


| Let the stimulus cards appea 
represents a continuum of subj есй 
of the individual on the line, i-e; 


X jeu 
a) | =: 

Р, P: 
s i i first author was on the staff of the Research 
Can’ ter nk ideaa were developed ЭШЕ ше made possible by the ONR гош NONE 
285(10). The vum Re бөрк bte d to Jack Moshman for his Веры Lehr a. 
e United Piera аа а is authorized to reprint this article in who part. 


33 


34 PSYCHOMETRIKA 


Consider the points, P; and P; , and the question, to which of P, and P. is 
X nearer? If Р, is nearer to X than P; , write a,; = + 1. If P; is nearer to X 
than P; write a,; = — 1. Define a;; = 0. For all i and J; tip = =a a 

All of the paired comparisons of the set of points may be tabulated in 
a square n X n matrix А = (а,;). This matrix has 0 in the principal di 
If the row element is closer to X than the column el 
contrariwise the entry is — 1. Since a 
skew symmetric. 


agonal. 
ement, the entry is + 1; 
go = — аң the matrix АЛ is 


The Development 


Definition. An answer matrix is a skew symmetric n X n m 
entries off the main diagonal are all + lor=— 1. 

Definition. The set of responses, or the answer matrix A, is called in- 
consistent if there exists no possible determination of distances between 
the P, , and no possible placement of X in (1) for which 4 is the answer 
matrix. If some, not necessarily unique, determination of these distances 
and placement of X is possible then A is called consistent. 

Definition. Vor each i, 1 < i < n, and a given answer matrix A, define 
№; = X(A) as the smallest subscript А; > 4 such that Gi, = + 1. If А, does 
not exist properly define А, = œ, 

Definition. Define p = p(A) as the position index of ün answer matrix 
А as p = min X; . Note that it is possible that p = ©, ie, A, = c, for all 
DIS э, 


atrix where 


THEOREM. The necessary and sufficient conditions for an answer matrix 
A lo be consistent are that p(A) = œ, or thal P(A) < o and there exists a k, 
1 € k < n, such that 


© р= (A) = м = Rk + 1, 


O M€X4x€.. EX, 


@й X f +1 fo k <1<һ 


œ for i= n, 
(iv) dii = +1 for TSN 


These conditions assert that in order to b 
matrix A has two connected regions of entri 
one of + 1’s and the other of — I's 
demareation line between the regior 
the right. The ease where р = 


no + 15 above the diagonal. 


€ consistent the ske 
tries above the Principal diagonal, 
as pictured in Fig. 1. The boundary or 


1S appears as “steps” going up and to 
o is the degenerate Case wherein there are 


W symmetric 


HAROLD B. GERARD AND HAROLD N. SHAPIRO 35 


i™ ROW 


AL = і + 157 COLUMN 


FIGURE 1 


esentation of the Conditions for Consistency of Matrix A 


Pictorial Repr 


An examination of the separation diagram (Fig. 1) is in practice the quick- 
est way of deciding whether or not the response matrix is consistent. 


Proor. From the definition of p it follows that, when p = 9, all the 
entries above the principal diagonal are. — 1, and hence those below this 
diagonal are all + 1. But this is precisely the answer matrix which corre- 
sponds to placing the point X to the right of the last point Р, in (1) or closer 
to P, than to P,., . In the following, then, we may restrict ourselves to the 
case p < œ, i.c., to those cases where X is closer to P,i than to P, . 

Necessity. Select a Г such that X lies between P, and Pia - We may 
assume that X is closer to P, than to Pris - (If not, X could be placed between 
P,., and Ру,» and closer to Ру without changing any of the answers which 
determine the entries in the matrix A. This would replace the role of k by 


k + 1, and X would be nearer P, than Pia 9) 


Ke em 

(2) es YU Pea co Pe 

In (2 کڪ‎ -i aat В, which implies that A; > k fori = 1,9, 

wi ар x PRA kd Also for? > дў > didtis clear that 
se = k + 1 and conditions (i) and 


k — 1. Since ai = j 
i Thus p = № 


би = + 1, so that ^; = 

(ii) are established. 

" In addition to knowing 
r , how much additional inform 

the answer matrix A? Suppose, for each 1, 

the first P, , j > û, such that X is closer to 

it is elear that 


that X is between P; and P,., and closer to 
essary to d 


ation is nec etermine completely 
А rd ; € k, it is known which is 


Pp, than to P; . Tf Pa is this point 


„ыг АЙ Her j < is 


and 
j Z kis» 
ined. Clearly also M = Ki 61 ЕФЕ 


36 PSYCHOMETRIKA 


Since it is immediate from the definition of P,, that 


ш < ui < A Eu, 


therefore (zz) follows. From what has been given above (2v) is 
This completes the proof of necessity. 

Sufficiency. We wish to show that if an answer matrix A satisfies the 
conditions (2), (i), (їй), and (20), we can find a configuration of the dis- 
tances between the P; and a position for X in (1) which realizes it. Again 
assume that p(A) < œ. Let p(A) = k. Place P, and P, on a line with 
X between them and closer to P, , as in (2). Consider Xa >= Ё-+Е1. 


If Xj, = k + 1, place Р, close to P, such that 


also immediate. 


Руы «XP 
(where PQ denotes the length of the 


place P,,_, to the right of P,,, and P. 
ing inequalities are satisfied: 


line segment РО). If Ma > k+l, 
x-1 to the left of P, such that the follow- 


Preaek SX Pps y 

Pi > Peak. 
Next consider №, > Neat» Tf Age = № › Place P,_» to the left of, and 
close to, P,_, such that 

Pik «р, 


IA aca. place P,_, to the left of P,-, and Р,,_ 


a to the right of P,_, 
so that 


Pr-2X < ХР... 


If this process stops at a А, = ©, choose P; far enough to the left so th 


PX > XP; , 


and place the remaining P; , j < 7, to the left of P, , and the remaining Р, 
h > Na , close to and to the right of Р), such that the last, point P, | 
satisfies , 


at 


Р.Х > XP, . 
The resulting configuration clearly has A 


as its answer m 
the proof of the sufficiency. 


atrix. This completes 
Fundamental types of inconsistency 

An answer matrix A may be incon 
sider two simple reasons which we desi 


Intransitivity. Suppose we have a 
а: = + l, ay = + 1, a; 


sistent for a variety of reasons. Con- 
gnate as fundamental, 


triplet of subscripts 7 


J, k such that 
+ = — 1. Then the answer matrix 


18 inconsistent, 


HAROLD B. GERARD AND HAROLD N. SHAPIRO 37 


for if the matrix A is realized by the set of points, P: , and a position for X, 
we would have P; , P; , P; a8 three distinct points with 


рь жү | = Ж (from a; = +1), 

pre | Ра Xl (from а = +1), 
and 

pe-xp epi (from ai, = —1). 


|Р; =X |<. | Ba X | in contradiction 


But the first two inequalities imply 
not be realized. An inconsistency mani- 


to the third. Thus the matrix A can 
fested in this way is called an intransitivily. 

From the skew symmetry of an answe 
above, implies also that 


r matrix, the triplet described 


а= =l; 


a, = +1, a; = +1, 
and 


+1, i= +1, ai = =1. 
That is, there are apparently three intransitivities generated. In the follow- 
itivity involving the triplet of subscripts 


ing we shall count these as 016 intrans 
2, j k. 
ibseripts ? < j < k such that 


Separati se we have & triplet of su 
parati. eer ix A is inconsistent. From 


aj = +14 Jon — 1; then the answer matr 
Bb dai. | ‚ and from dj = © 1, X must be to 


a;; = + 1, X must be to the left of P f 1 hee ү 
the right of P; , which is impossible. Ап inconsistency ша ested by а 
triplet т, j, Е such that? < j < k and а; = T1847 7 1, is called a 
separaron: А Hil sind d 
It is important to note that the cause of an intransitivity is indepen ы, 
of the ordering property of the points, whereas separatione, are intimately 
connected with some assume : order requirement. It is not true that 
intransitivities and separation errors are À à 
characterize in a set paire isons. Lt 18 borer bu kie > 
1 ; i «at least one intransiu 
Consistency as herein defined will result 1n a АИ 


Separati : ts of separation an 1 
hann Тыш swer matrices possessing one without the 


in the sense that there exist ап 
other. 


Qik = 


Characterizaté А 
vzatio onsistency i 
ot istent if and only if it contains no 


int THEOREM. An answer mat 
ntransitiviti ; 
des or separations- LE 
rem 1 i vial since the 
Pno m i? part of the theorem. 18 quite trivi wd 4 
or. The “only f. y r separation renders ап answer matrix 


Presence of an intransitivity ° 


38 PSYCHOMETRIKA 


consistent. On the other hand, assume the answer matrix has no intran- 
sitivities or separations. We will prove that it satisfies the conditions (%), 
(ii), (iii), (їр). Clearly we may assume that p(A) < =, since if p(A) = œ 
the matrix is consistent. 

Let the rth column be the first column such that a + 1 appears above 
the main diagonal; r > 1, and it exists since we assume that p(A) < o. We 
will first prove that above the main diagonal all — 1’s in the kth column 
are above all + 1’s, For if otherwise, 

а = +1, ап = —1, TLIE 
or 


йв = +1, а; = +1. 

Since there are no intransitivities th 
Uy —1,4<]< Юша sepan 
The above argument demo 

and + 1% group themselves 
column with a + 1 
that А, = 


(3) P= Mar «XA. «XX, 
since it then follows that p(A) = r — 1 and № = i+ 1 fori = 7. Suppose 
that (3) is false, i.e., for some kr=2>k> 1, X < X4. Then a, = + 1, 
sı, = — 1 since V > p Zk+2>hk+1, But this contradicts the fact 
that above the main diagonal — l's are above + 1’s in each column. We must 
also verify condition (wv) of the consistency theorem, i.e., that for j > М, 
а = + 1. Suppose that this is false, i.e., for some j 2, ай = — 1. Since 
ам = +1,9>,;, then а; = + 1, and since there are 


no intransitivities 
Qj»; = + 1. It follows that Mii = mT and ap, = + 1, which is a separation. 
Thus the conditions of the consistency theorem are satisfied and A is con- 
sistent. 


Since the notions of intransitivity a 
degree to which 
consider the que 
separations. 


is implies а = + 1. But then а; = + 1, 
ation, which is impossible. 

nstrates that above the main diagonal — 1° 
as required. Now in the rth column (the first 


above the main diagonal) we must have а, = + 1, so 
Thus there remains to prove only that 


ind separation lie 
an answer matrix can be said to be ine 
stion of determining the number 


at the basis of the 
Onsistent, we next 
of intransitivities and 


Number of intransitivities 


Let T — the number of intransitivities in an answer matrix А: Ri, the 
le AST es 
sum of entries in the kth row of A, 


THEOREM. 


T 1 


=p EG EE LI Ex]. 


HAROLD B. GERARD AND HAROLD N. SHAPIRO 39 


p B — i 
noor. For convenience, introduce C; — bm а = sum of the entries 


in the ith column of A. Since aj; = — Gi» 

(4) С; = Е, 

: e 25 consider for a given patr ?, 7 (i # j) the number of k such that kd 
, and | 
Е 

(5) as = +1, а: = +1, а = а, 

ог 


ag = +1, ax = +1, auc e 


Let NP = no. of ky 1 £F < n such that a = +194; +h 


NEP = no. of k, 1 Sk < n such that an = — бе = — 1; 


= йар ЫЕ 


NG? = по. of ky 1 А < n such that aix 
aure 


n such that а = 


I^ 


NS; = no. ofk, 1 Ek 
is omitted in what fol 
atrix with zeros on 
U, [U];.; denotes the en 


lows. 
the main diagonal and all 


try in the ith row 


The superscript (2, J) 
Denote by Z the n X n m 
oiber entries + 1. For any matrix 
and jth column of U. Then, fori # J: 
б n-2- Ne + №- 
ИЕ и qe cH 
(гш) |ZAl.: = Nec N-- Th TM 


м №) 


апа 
= Nra = Ne 


such that а = 1, and №; 
. №, and N 


Gp (Atha = е 4 N-- 


= number of k 
= 1, we һауе [AZ]: = 
Note in addition that. 


1 


Юва, & fus " 
bserve in (22) that for NY 


ч of k such that di = 
++ Na, QN; Na Ne 


jaz = ae 


045 و = 


Ry 2 бз 


6 < " 
(6) [ZA]. = 5! — dii 


[А7152 = 2 нйн ` 


Adding (i) and (iii), then @ and (Ù) 
+ [Z4]..2 = 


@ [AZ]. 


40 PSYCHOMETRIKA 


and 
(0) 3n — 24 [4?],) = N.N... 


Now if N., = 0 there exists a k such that a;, = + 1, ar; = + 1, so that in 


order to have consistency a;; would have to be + 1. On the other hand, if 
N.. 7% 0 there exists a k such that аъ = حه‎ 1,4; = – 10а; = + 1, 
а = + 1, and consistency would require that a; = + 1 ога; = — 1. 
Therefore if a; = + 1, the number of intransitivities involving 7, j as in 
(5) equals N__ ; if a;; = — 1, the number of intransitivities involving û, j 
as in (5) equals N,, . Thus, in any event, the number of intransitivities 


involving 7, j as in (5) equals 
[1 + a)N.. T (1 == a:)N ++] = 


4N- +...) – а. (№. — NJ]. 
Using (v), (vi), this in turn equals 

Hin — 2 + [47];.; zr aall AZ]; + [2A].2]- 
From (6) this may be rewritten 


шы = 110 — 2) + x Gide; — a;;(R; + C; — 2a;;)]. 
Also, : 


The factor 1/6 arises since in и, 4 and j have Symmetrical roles so that 
in the sum over all unequal 7, j, each comparison is counted twice. Also an 
extra factor of 1/3 is introduced since we do not count as distinct a “per- 
mutation” of an intransitivity. Since for 7 24 j, a; = 1 we may rewrite 


Big = ifn EP >; аа — a;(R; + °2]. 


Now, 
»» 9» (2, а) + n(n — 1) 
= 2 CR, -+ nn - 1) 
-= TÈ Ri + nin — 1). 

Also, 


»» a;(R; + C) 


Ш 


5 a(R: +0) 
UR, Уа. E 25, 2:89 
LEOD, 


1 


ll 


HAROLD B. GERARD AND HAROLD N. SHAPIRO 41 


Finally, since 25,4; = n? (n — 1), 
T=} p Ba 


iei 


1l 


zi wn- 1) + n(n — 1) — »m-2 YR) 
k k 


1 


э [no — 1) — 3 È 80, 


may be transf ormed 


The formula easily 
lls circular triads. 


which establishes the theorem. 
p. 156) for what he ca 


into a result given by Kendall ([2], 


Number of separations 
above, a formula may be obtained for 


By a method similar to that used 1 
the number of separations in an answer matrix А. Let Д = matrix resulting 
from A by making all entries below the main diagonal equal to zero, and 
write 
Д = (di); 
so that d,; = 0 for? 2 jand d; = ii for i < j. Also let 
Ó, = sum of the entries of the kth column of A; 


R, = sum of the entries of the kth row of А; 
S = S(A) = number of separations in Æ- 


THEOREM. 
s =} [nia = De = D4 am Et- DA: - gaal 
pason Let Z = ыў be the ^ x n matrix with + 1 ры main 
diagong zer where, We propose to count for a fxe pair, û ME 
gonal and zero elsewhere. p йер ГУ еле CENS, 


the number of k, i < k < j such аба = 


AD X adii 
т) = pa" Gis 


j>k>i 
Gu 


wa, gu "T5 gue JP gu, 
= + l, ai; = 4+ 1 and the 


sion i$ possible the super- 


\ 


where Ñ? = no. of k, ? < р < j such that а 

others are defined analogously- Where n9 ae 
script (7, j) is omitted. 

Ж = йы 

(8) [2А]. = EL E 2 

Bis cf л 


42 PSYCHOMETRIKA 


(9) j-1-i=N%,,4+N_4+N4 NN... 
(10) A") = У аза: = Me tN М. 


ickej 
To solve for №", 
HŽ; — [24]... 
31 = 2-1 - 


Ml I 
“a > 
; s 

+ | 
> > 
i 


so that 

(11) N- = FHA], — (24), +j- i-i (Aa, 
and the total number of separations 

(12) — = У УЙ. 


Note that 


it 2 Qu) = 27 Fides 


І 
d 
M 


(13) id 


У 9» а) = 25 12,4, 
"mm 2 UT 2d 
2 lk D= 0 Уа, 
È (k = 5. 


2G-i-1) DG 1), 


i>i 


Ш 


І 


ZG-u- Diz: 


7 
i 


>; @—Д*— Жей 


(15) 2 $ a win + 1) 


ii 


ll 


22 + Зп? +" ë + 3g 
3 2 


HAROLD B. GERARD AND HAROLD N. SHAPIRO 43 
піп — Dan = 2) 
б 


У dûk; = X йд pS Gy; 
i.d j 
У; aul, 
k.i 
= Ya. 
k 


through (16), yields 


1 


(10) 


Combining the ten equations, (7) 


es ` E — n — 2) ЕЕ 2; бп = B = Е (Е = n z ca. 


This result completes the proof. 


Consideration of the following three examples wi 


of the formulas. In all three 


1l clarify the application 


examples, n = 4: 


Example I 
ORB, B 
0 =! awn #1 Жї #% 2 2 
ai 0 #1 41 +1 41 +1 6 5 
ys eae à 4 
кы а эу жй #@ Ж 0 3 
PITE X. а 4 41] x 
21 =ї р 21 =! o 41| -4 1 
a at LS aa OLX 
eG 0-1 0 я а 5» ? 
Ante — p = 2] = 35; 
ae ee 
k 
И i 
SR cc E a 
k 
ШОРТЫ; 
= 185 qgp-5-1077 
iet раш de DÀ 


saat set Ot 


ч 


рр: г, 
= ў (7(48) 


44 PSYCHOMETRIKA 


In this matrix there are no separations or intransitivities. The matrix A is 
consistent. A clear demarcation line exists above the diagonal separating 
the + 1 and — 1 entries. This boundary appears as steps going up and to 
the right. The matrix is realized by 


0 . x ^ . . —— . H 
i M A A ы HK CX 


This realization is certainly not unique. The 


re are many possible realizations 
which meet the criteria, namely 


the set of inequalities which must be satisfied. 
Example IT 


This exemplifies an answer matrix with separations and no intransi- 
tivities. Form the matrix A, by changing A of Example I so that o; = —1 
and о = +1. The sums then are 


Ke 2 6 A mo à 
5 9; 5B à 3 а-та 


5 79-1 0 S à в 4 
Clearly T — 0. Now compute S. 


040-0 = =5 +0+90 4845 = 17, 
ZERG-2)-5484948—5. 925; 
| YRC =0—5+0+90+8—5—+. 


8 —185--17—925 — 7) = 5. 
The five separations would in fact be 
а = +1, Qs; = —]: 
а = +1, 9s = —]; 
аз = +1, 9 = —1; 
Qu = +l, ag = —] 


as = +1, A, = —]. 


It is clear that the difference between the 


matrix A and the m 
that the positions of P, and P. 


є atrix A, is 
have been interchanged, ] 


Example III 


This exemplifies an answer matrix with intransitivities but no separations 
а ; 


HAROLD B. GERARD AND HAROLD N. SHAPIRO 45 


Consider 


Au depo 0 ей = +1 =ї 
+3 чагаа 4 a 41 0 
Ja dd Bl duo 8 0 —1 ES 


(, 0 -1 -2 


X бп — P 
У (в — Df = РЕ а 
k * 


> R6. 3444345715 
k 


1 
| 
© 
1 


1 


1 


On the other hand, 
> Ri = 
1, 5 
— 216) = 5- 
T= 2i (336 e 


ill have both separations and 
stency the quantity 


aad dem Jn 


In general, inconsistent answer matrices W > 
f deviation from consi 


intransitivities. As a measure © 
p= (A) = 5+ T 
Ф = Oif and only if S = T = 0), 


ure (since i 
" tent if and only if (A) = 0. 


is suggested. In terms of this me? å 
a previous theorem provides that A is consis 
Remarks 
f determining f inconsistency within a set of 
n considered. A definition of consistency was given 
s of incon i defined—namely, in- 
er is intimately related to an assumed 
were given which enable the counting 
of inconsistency in a set of data. Proofs of these 
d. It can be shown that separations can occur 


Summary and 


E The problem o 
paired comparisons has bee 
ps two fundamental YP 
Tansitivity and separation. 
fue ordering of stimuli. 
н the number of each type 
ormulas were also provide 


46 PSYCHOMETRIKA 


without intransitivities and vice versa. In general, however, inconsistent 
data will contain both separations and intransitivities. 
The criteria of consistency developed in this paper 
components: intransitivities and separation errors. The counting of separation 
errors is appropriate only where an a priori ordering is assumed, However, 
S may violate the assumed order, and re- 
appears consistent to him. One may c: 
sistent if there exists some ordering relative to which the responses are con- 
sistent, i. e., there are no intransitivities. It is in fact easily seen that a set of 
responses is relatively consistent if and only if there are no intransitivities. 
Implicit in the model is that the stimuli are thought of as being pre- 
sented simultaneously. If one is interested in the effect of order of presentation 
upon choice and cons stency, a modification of the method may be made. 
One could consider each stimulus as being a с 
with its order of presentation 


were presented simultaneousl 
model. 


is made up of two 


order the stimuli in a way which 
Ш a set of responses relatively con- 


omposite of the original stimulus 
; and treat each composite stimulus as if it 
y, i.e., as the stimuli were treated in the above 


. REFERENCES 
[1] Gerard, Н. В. Some factors affecting an individual's е 
in а group situation, , 


J. abnorm. soc. Psychol., 
[2] Kendall, М. С. Rank correlation n 


Manuscript received 12/10/56 
Revised manuscript received 5 /13/57 


stimate of his probable success 
1956, 52, 23 ). 


tethods, (2nd ed.) New York: I afner, 1955, 


PSYCHOMETRIKA— VOL. 23, No. 1 
MARCH, 1958 


PROPERTIES OF THE ITEM SCORE MATRIX 


Axavs б. MACLEAN 
CALIFORNIA TEST BUREAU 


„A method of deriving from the item score matrix all the usual statistics 
describing the performance on à test of a group of examinees is given. Since 
this matrix usually is not actually written out, but is implicit in a set of 
punched cards, a method of working from 2 more compact matrix Ё is de- 
scribed. A numerical example is presented. Applications and advantages of 
the method are cited, as compared with that of recording only the examinees' 
test scores and the item difficulties. 


Equally Weighted Items 
N by n rectangular matrix with elements 


X,, all of which are are either 1 or 0. Each row of (X) is a row vector (X 
which lists the item seores of student s. If items are to be weighted equally 


the sum of the elements of (Х,) is yo Xe = Xpy the test score of student 


s. The sum of the test scores of all stud 


ents in the sample is 
N 
(0 н Жу а Г: 


sel 


An item score matrix (X) is an 


the sum of all elements of (X). 
The column sums of (X) are o 


(2) X ese bs 


[ interest since 


g correctly to item i. кея 
r student s is obtainable by premultiplying 


a procedure which yields a square 


the number of students respondin 
t The square of the test score fo 
he row veetor (X,) by its transpose, 
Symmetrie matrix of unit rank: 

6) (хэ = (KYA. 
atrix is X; - 
discussed leac 


e those và 
are enclosed in p 


Í to scalar values, others to 


lues. For the purposes of 
arentheses, while 


The s . 
he sum of all elements of this m 
A Some of the operations to be 

atrices, the sums of whose elements ar 


cee therefore, all symbols for matrices 
ymbols not so enclosed will deno m „; for student s. There- 


o The elements of (X,)'(X.) are the products Х.Х 
те 


(4) xi-X Y XaXa- 


47 


48 PSYCHOMETRIKA 


In general, the square of a sum may be obtained by squaring the row vector 
whose elements are the sum's components, then summing the elements of 
the square matrix so obtained. 

Summing (4) over the N students gives 


G) Es Y m EGGS 


S is also obtained by summing the elements of a square symmetric matrix 
(S) obtained by 


(6) (5) = (Х)'(Х). 
It could also be obtained by adding the N matrices (X?) obtained by (3), 
that is, 
N 
КО) (5) = (Х)/(Х) = >; Z). 


The side elements of (S) are the cross-product sums S;; of the columns 


of (X), while the diagonal elements S, are the result of multiplying the columns 
by themselves. That is, 


(8) S; 


2 Xi , 
(9) Si = Же us . 
T and S always denote summation 


They are the statisties used in 
lations, as follows: 


over the N individuals in the sample. 
calculating standard deviations and corre- 


(10) у з= vL | 
(11) ES TE VE 
in which 
(12) І, = М8, – т, 
(13) L;-NS;—TT, 
It so happens, when scores are either 1 or 0, that 
(14) 8, =Т=}, 
апа 


(15) Si = f 


ij 9 


ANGUS G. MACLEAN 49 


where f; denotes the number of students scoring 1 on item т and f;; the 
number scoring 1 on both 7 and j. In other words, counting may be sub- 
stituted for adding and multiplying; the matrix (S) obtained by the opera- 
tion (X)'(X) is identical with the F (frequency) matrix described in a recent 
paper on item selection methods [1]. This matrix (F) сап be easily obtained 


by IBM machines. It should be remarked that (11) yields phi coefficients 


when scores are dichotomous. 
A procedure has thus been giv 


statistics from the matrix of item = 
yield a great deal of other information which a list of test scores will not. 


From (X) itself the item difficulties (and, of course, their mean and variance) 
may be obtained as well as the item variances, test scores, and the sum of 
test scores of those responding correctly to any item. This last statistic is 
useful in item selection and may be considered as the product of column Z 


with the column of row sums, i.e., 
(16) Sit = Dd XX. . 


From (X)'(X) we can obtain the same information plus interitem and item- 
test (point biserial) correlations, Kuder-Richardson reliability estimates, 
ete. The relevant formulas and item selection procedures are discussed in [1]. 


Differentially 1 


en for obtaining the usual descriptive 
cores. In addition such a matrix will 


Veighted Items 


se of differentially weighted items. 
ith the special case in which 
] formula for a test score 


Consider now the more general са 


The foregoing discussion and reference deal wi 
every item is given a weight of unity m the genera 
composed of a linear sum of weighted item scores: 
(17) Xam WX + WX.2 af: Sis * т WaN an » 
ivalent to 


x, = GP", 


In matrix notation (17) is equ 


(18) 
Where (W) is the row vector of item weights: 
(19) (W) = (wi, 4023 e T а). 
is desired, perform the operation 
ү ERE i w , etc. This 


.. T a matrix of weighted i 
Со, where (D,,) is а diagonal A 
eaves the rows unsummed, whereas (xx 


matrix with elements Wı , 


7)’ sums them. The following 


Operations yield the results indicated: 
(X) D.) = row of weighted item scores for student s. det 
(X.)(W)’ = weighted test score of student s; sum of dlements ohare 


IDS — 2 W by n matrix of weighted HM dn 


50 PSYCHOMETRIKA 


(X) (W)' = column of № weighted test scores; sums of rows of (X)(D..)- 


(Du) 07). 


\ 


square symmetric matrix of order n exhibiting the weighted 
S; and 5,; values, i.e., sums of squared weighted item scores 
and sums of their cross products. This is the matrix (5). 


(D,,)(F)(W)’ = column of the n values of 5,, ; row sums of (D,)(F)(D,.). 


(W)(^)(W)' = S,, the sum of squared weighted test scores. This is the sum of 
all elements of (D,)(F)(D,.). 


Т„ may be obtained by summing the elements of (X)(D,) or (X)(IV)^, 
and the standard deviation of the weighted test scores will be 


(20) oe, = VNS. — Т/М. 


If the squares of the individual weighted test scores are desired they 
may be obtained by (W)(X,)'CX.)(W)' or by summing the elements of 
(D,) (X) (X) D.) but it would be easier to square individually the elements 
of (X) (W)' already obtained. 

Of course, the column sums of (X)(D,,) are fin = w,f, and the row vector 
of these is equal to (f;)(D,,). The weighted S; and S,, in (D,)(F)(D,) are 


equal to 


Siw = 2 WX, 


(21) 
= WS: , 
and 
Si. = x w;X,w;X,; 
(22) ы 


ww;S;. 

'The foregoing techniques are applieable where item 
performed on a test composed of weighted items. Alt 
scores had been punched 1 or 0 and it was subsequently decided to weight 
the items differentially, the mean, variance, and reliability of the revised 
version might be determined by these techniques. The реет woulc 
employ the original F matrix, if F had been determined initially, or would 


generate F, and then apply the weighting matrices (WW) or (D,) to produce 
the desired information. ” p 


analysis is to be 
ernatively, if item 


Illustrative Example 


Suppose that five students* made the following scores on a set of four 
*This N is chosen purely for illustrative convenience. T 3 i 

" ге cases is ri "hee. In practice а теј resentative 

sample of 200 or more cases is recommended to ensure greater reliability ot cy сек 
derived. 8 eliability of the statistic 


ANGUS G. MACLEAN 51 
D 


items: 


1 0 0 1 
1 1 ї о 8 
| „з 3 3 2 i= = و‎ 
p, -60 60 -60 40 
Si 
$5 2 2 1 7 


(S or (FP) = (X)(X) = 128 2 
2231 8 


= Fie 4-8 


(This ean be checked by squaring the row sums of X.) Usual statistics: 


Also, X = à» 


X = T/N = 22. 


р = T/nN = 55. 
c= (NS = T/N? 
A 55 — 81 
Zi, < и | N Zi- Ê) ا‎ _ 8-3) авы 
= ре ст 3 m 
item homogeneity; à negative 


an index of à 
be negatively intercorrelated. 


the items to З Н 
phi coefficient, between items 


= 14/25 = .56. 


Kuder-R} | 
CUN IE ae formula 20 18 
een indicates a tendency for 
Spection of (X) confirms this. 


l and 4, 
i. мВ 
фі = Nf ЖЕ fi VN - fi 


52 PSYCHOMETRIKA 


In many situations (but usually not in item selection) an L matrix is derived 
from (5), with side elements L;; = №5,; — Т.Т; , and diagonal elements 
L; = NS; — Ti. The side elements are the numerators of the correlation 
coefficients, the denominators are the geometric means of the appropriate 
diagonal elements. In matrix notation 


(23) (D) = N(S) — (yr), 


where (T) is а row vector containing the sums of scores on each variable 
and N is, of course, a scalar. In the case of items scored 1 or 0 this becomes 


(24) (D = NF) — V'O), 


where (f) is the row vector of item fre 


quencies (number of students scoring 
1 on each item). Then, in the example, 


15 5 10 5 9 9 


9 6 6 —4 1 =] 

(ie 5 15 10 5 _ |9 9 9 6 а её 6 1 =1 | 
10 10 15 $ 9 9 9 6 1 1 6 —1 
5 5 Б 10 6 6 6 -1 =] sł 6 


It is evident from (L) that four out of the six interitem correl 
negative. It may be noted that the L matrix may be converted into an item 
covariance matrix by dividing every element by N*. 

Now suppose that it is desirable t 
as follows: 


ations are 


o apply a set of weights to the items 


1 


Item Number 1 2 3 4 


m= 8 3 в p 


Then: 
Row sums of (X) (D,,) 
= (000) = X= Y wx. 
3 0 5 0 8 
3 0 01 4 
(X)(D.) = |03 5 1 9 
0300 3 
38520 п 
35 = T, 


ANGUS G. MACLEAN 58 


and 
Row sums = (D,)(F)(W)' 
27 9 30 3 69 | 
ору | 9 7993 е 
30 30 75 5 140 
з 3 52 .13 


291 = (WEW) = S. - 


2 (VS, — ТӘ/№ = 230/25 = 9.25. 


REFERENCE 


A. T. Some computational shor 
1954, 38, 260-203. 


1] M 
Ш MacLean, A. G. and Tait, i-cuts in the development 
А or analysis of tests. J. appl. Psychol., 
унон received 2/8/07 

ised manuscript received 5/16/57 


PRYCHOMETRIKA—VOL. 23, NO. 1 
MARCH, 1958 


THE COUNSELING ASSIGNMENT PROBLEM* 


Jor H. Warp, Jn. 


AIR FORCE PERSONNEL AND TRAINING RESEARCH CENTER 


vides information about each possible 
1 classification situation is discussed. 
hine methods and can be used by 
use of the disposition index 
] solutions obtained by other 


A disposition index, DI, which pro 
placement to be considered in a personne 
"The index is readily computed by mac 
counselors required to make assignments. The 
provides an adequate approximation to optima 


methods. 


The personnel classification problem has been discussed previously by 
several authors [1, 2, 4, 5]. This problem has been shown to be similar to the 
Hitchcock-Koopmans transportation problem, which is a special case of 
linear programming [6]. The techniques presented in the following discussion 
have a direct analogy to the problem of a transportation scheduling supervisor 
who is responsible for transporting products from several origins to several 


destinations in an economical manner. 
The problem of assigning personne 
as follows [6]: Given n persons to be assigned 
of the ith person on the jth job, find an assignment 
that total productivity 15 A solution t 


a maximum. + | hak 
. : «og [2, 3, 6]; if the problem 1s 
determined by linear programminé techniques [2, 3, 6l; p 
not too large, the assignme 


nts can be determined by automatie methods 

: "hic is "tic r concer! 
Without the intervention of counselors. This problem 1s of particular concern 
in military and large indu 


istrial personnel assignments but is not closely 
related to individual vocationa 


] guidance. s 
A major difficulty with this approach to the problem is that the pro- 
ductivity values are generally only € 


rude estimates of the value of a person 
aj о is still ne 
оп a job. Consequently there 18 still 1 


ed for intervention by counselors 
informati itional problem 
to account for unforeseen significant information. es y д m 
in the use of a completely. counselor-free nsi | pi PLE ^ 
quite difficult to sell, operationally. This is pe : 1 Ӯ 
drastic, noticeable system change brought about ©. 


onversion from the 
old to the completely automated : 


A reasonable approach en luable assig 
Syst s ек — singly va ass р 
ems and providing increasing der ARDC Project Мо. 702, Tus Чыр 

unt ^ H sonnel ап! 
DE m of the Air Fore tl for reproduction, 


nt progra | i ted 
1 develop Р'Техаѕ Permission іб Bo for the United States 


] to jobs generally has been stated 
red to n jobs and the productivity 
of persons to jobs such 
o this problem can be 


ystem. . 
e continuing the present counseling 
в + H 

nment information that 


Же 
ifie This report is based 
йз. port of the research anc 5 а le i 
trar arch Center, Lackland Air Force a] in whole and in p? 
Translation, publication, use, and dispos? 
‘Overnment, ' 


56 PSYCHOMETRIKA 


will lead to the optimal solution. Continuous gradual improvement of the 
information supplied to the counselor assignment process will result in more 
effective assignments. The procedure may ultimately converge to an auto- 
matie system—human intervention decreasing with increasing adequacy 
of productivity information. This procedure will have the advantage of 
gradual implementation—leading readily to acceptance because of minimum 
interference with existing procedures, and more adequate utilization of 
personnel. The following material will include a description of a placement 
or disposition index which can fit into a counselor assignment system. 


A Counseling Assignment Problem 


Consider the problem of assigning n men to n jobs given the productivity, 
Ci; , of the ith man on the jth job. In the counseling situation it would be 
desirable to have information (perhaps represented by a single index) associ- 
ated with each possible placement that would reflect characteristics of the 
entire c,; array. In order to consider the relative merits of particular place- 
ments, a counselor should have not only an individual assignee's productivities 
(as indicated by an aptitude score, achievement score, or some other measure) 
but also an indieation of the productivities of all other personnel to be placed. 

Assume that an individual counselor is required to assign three men to 
three jobs, and suppose the productivity index matrix is as follows: 


1 

1|8 

Persons 2| 5 
316 


Assume further that the counselor can see only one man's produetivities (or 
perhaps test scores) at a time and that he adopts the poliey of placin n 
in an available job in which he has the highest productivi ty. If the = a mar 
to the counselor in the above order, the assignment would be as pre 


Person Job 
1 1 
2 2 
3 3 


E 


The first man's highest index is on job one; the counselor will theref 

place man one on job one. There are then two jobs remaining: si тегере 
two has a higher productivity on job two than on Poem j: since man 
job two. Finally, the third man will be placed on job three Thi УШ have 
was selected as an example because it would provide t . This sequence 


he lowest possible 


MIMMI—-—————U———— ieFtC 


z ——MÁ 


JOE H. WARD, JR. 57 


sum, си + с» + c = 10, and therefore would be considered the worst 
assignment. The maximum sum, Cor + ба + бз = 15, would have resulted 
only if the men had entered in the sequence 2, 3, 1. If there had been a 
completely automatic system which would give the optimal assignment, 
Cia + Cn + Ca = 15, all would be well if there were no possibilities of ad- 
ditional information about productivities. 
Assume now that the counselor has determined (before talking with the 
men) the optimal assignment and feels confident of his position. When man 
one enters, the counselor plans to place him on job three where his pro- 
ductivity is six. However, after further investigation, the counselor finds 
it is impossible to make this placement; for lack of a second recommended 
placement, the counselor places the first man on job one (productivity equal 
to 8) in his effort to maximize the assignment sum. It is now apparent that 
the counselor is on his way to making the worst placements again and will 
be forced into the minimum assignment sum бу + с + бз = 10. 
Even though this example is made to demonstrate the worst айша 
it is still apparent that it would be desirable to provide the m or b h 
information reflecting the relative merits of each placement. а apo 
index, DI, that is to be developed should provide this type © и 4 
and should be expected to result in efficient assignments at small comp 


tational expense. 


Development of a Disposition Index, DI 
n p on the job 4 Having made that 


Consider, first, placing the person Р px pore 
placement, assume that all possible assignments omer ote bow 
assignment of the n — 1 persons is equally нар а m 
Possible sums containing C» and the probability 


4 0 i tainin| 
o the assignment sums con ain 1 E 
Now consider the mean value, (Sn), of th " Ау, poles ated 
2. lected the value Cpa у 


€», , and consequently the mean value, Ё 


i i i . Having se 
Contained in the (n — 1)! sums involving Cra 


= 1) remaining rows and columns 


the sums contain only elements from the (n te resulting square барвай 
of the c,; array. Now each element, y^ oe (а= кш аав it 


Order (n — 1) is contained in (n = ы we), at a ained 


follows that the mean value Ё (Spa) and € 
as follows: 


tig tt A а 1)!, 


— Cy. 


EG.) = (n — Di t  - 216 
Where 
п с б.а = L lie ; 
= S P Gi = È oi ’ jal 
с.. = >; P 24 


58 PSYCHOMETRIKA 


E(S,) = (n — De +c.. — Cy. — C.a + с„]/(% — 1), 


= (по, — Cp. — бы + с, .]/(% — 1), 


а) 
and, dividing by n, 

1 Tm NN _ _ 
(2) E(Spa) x E(S,) = [neya — с. — c. + c]. 


Now consider the mean value, (55), of the sums not containing Cpa ; 
consequently, the mean value E(sz) = E(Sz)/n of the productivities con- 
tained in sums not involving c,, . There are (n — 1) (n — 1)! such sums, 


and the values of Ё(5;;) and £(s;;) are obtained as follows: 


Е(55) = SIS (n — 1), — (n — te, 


(3) — (n — 2) c. — с, — C.a + c,))] 
= Wm [n — 2)с.. + с, + 6. == neu], 
and, dividing by n, 


— 1 
4) Еб = "LUCR. n5 BF [% — Qe. + ey, + е, — neye). 


Now consider the difference D, = E(S,,) — E(Sz 


А | ma), between the mean 
sum obtained when placing the pth person on the qth job and not making 
that partieular placement. From (1) and (3), 


— аш 1 
Ds = Gm Ty Mm = ey. — e.g Бе] — (n — 1? lm — Be.. + о. 


Toca = nes] 
1 
(5) = (п 1) Inn — е, + (n е — (n — De. + c.) 
~ = 20. — (6. E e) + ne] 
1 


(n — 1)? [reya — nle, + €.) - e. .], 
and, dividing by n, 


= Me) = BEÎ = Lpa 1 pe 
(6) dy, m E(s,,) (spa) = ACD) сон n EG) = D,,/n 
(9 


= 
S aa — 1 Mtoe = nes. + ea) e]. 


t 


JOE H. WARD, JR. 59 


. presents the difference between the mean value of the 
assignment sums involving c,, and the mean value of the assignment sums 
not involving c,, . The value d, represents the difference between the mean 
value of the productivities contained in assignment sums involving Cpa and 


the mean value of the productivities contained in assignment sums not 


involving с, . It is apparent then that as the value of d,, or D,, increases 
to result in a larger assign- 


the placement of person p on job q is more likely 
ment sum. 
Some interesting properti 


о бако = BES) = LEGO = LSA = e i 


i=l j=l 


The value D,, re 


es of these equations are the following: 


rel ie! 


(8) Y Be) = SEG) = È Ев) = Blea) = ¢../ni 


vel jel 


consequently, 
n 


(9) Ур, = Y Dae D laa Eh o. 
i=l j ы 


j=l fet i 


alues of D,a and d,, are in a type of deviation 
form simultaneously by rows by columns. Putting the ci; matrix in deviation 
form by rows (or columns) first and then in deviation form by columns (or 


rows), the deviational form, êro » becomes 


(10) б. = ЕЗ [e — n(c,. + с.) t c. 
n 


This indicates that the v 


obtained by putting the с;; matrix in 
from D,, only with respect to 
(n = 0. 

jobs, where m 2 т, 
ral conditions. 


Therefore it can be seen that Ôm › 
deviation form by rows and columns, differs 
the factor 1/27, whereas Dre involves the factor 1/ 

Since it is frequently desired to assign m persons to n 
consider the expression for Dye and dp under these more gene 


Wael (m = n)! | tim = 1)! (n — 2t = Cy. — С.а +] 
105.) = (m — Di т — nyt ore + (m = п)! ©. 
where 
үн = jm ‚ее XT ср. = 2м ; 
=й. fel cx 
a Ри foede E Had 
(11) m — 


60 PSYCHOMETRIKA 


and 


1 1 "m % 
(12) ES) E SEG = nm — 1) [тс Cp. бы c &. T: 


EGJ = - (m — n)! [e — 1)! (m — 1)! 


т — Din — 1)!|(®—-! 7 (т n)! ^" 
— 2) 
(13) е e eet on) | 


۴ - wo [(т — 2)e.. + ey. + C.a — me], 


and 


1 
(0) Blea) = ESD = n s — 3e. о е — meal. 


Then the difference D,, = E(S,,.) — E(Sz;) leads to the expression 


(15) D, = к= [mpa > т(с,. sp бё) + с..]. 


Dividing by n gives 


1 1 А 
(16) й, = = Dra = nm — iy [meng — mle, + с.) + c... 


It is important to notice the similarities to the several expressions pre- 
viously developed. We can write 


E(S,) = k, Mepa — 6, — Cal) + ka, 


E(Sz) = ks[ne,, — 6; — 64] t ke, 


D» = Ышы — ey. e] + ka, 


d, = hn, — е, — c.e] + ka. 


, Thus if the magnitude of any of these indices is used as a basis for 
assignment, then the value 
(17) $5 = Nyy — Cy, — c 
will provide all of the distinguishing information amon; 
The easily computed index $,, provides a large amou 
cerning the array of productivities. 
There are several possible indices from which a disposition index, РЬ 
may be chosen; the one probably most meaningful to the counselor is Q» 


g possible placements: 
nt of information con~ 


JOE H. WARD, JR. 61 


E (spa). This index is the mean value of the productivities contained in all 
possible assignment sums involving c,, . It is directly related to the P 
ductivities, and it has the same interpretation for any value of n. Tor the 
more general case of m persons assigned to т jobs, where m 2 n, E(s,,) is 
given by (12). It is therefore suggested that the disposition inde DL, be 


defined by (12): 


(18) — — =o —— 
DL. n(m — 1) [Mese — €» — ба с..), 
where m = number of persons to be assigned, 6 


n = number of jobs to be filled, 


c,, — produetivity of the pth person on the qth job, 


n m m n 
Ср. = е Oa = Zyoin Oe = By RO 
x tml e 


The Disposition Index in a Counseling Assignment System 


relative merits of making a 


DI, reflects the 
about the entire productivity 


The disposition index, 


particular placement based upon information 0 
array. The first step in using the DI would be to compute the entire matrix 


of DI,, ; that is, compute DL, for every person on every job. If the entire 
DI matrix is available, placement could proceed by placing the largest DI 
first, next largest second, and so on until all placements have been made. If 
elaborate data processing equipment is available, the DI matrix can be 
computed after each placement to reflect the change of conditions. This 
should tend to provide an assignment sum that is very nearly optimal. In 
any case, the reduced matrix of DI can be computed after, say, every tth 
placement with the frequency of updating determined by the speed of avail- 
able computing facilities. In actual operation, it would probably be desirable 
to update the DI matrix at the end of each day and at the same time л 
to counselors the DI's of the personnel to be placed the Dip z 
Consider the application of DI's to the simple problem presen pre- 


viously. The productivity атау, complete with row and column sums, is: 


Jobs 
1 238,9 —— 
1 8 7 6 21 
6 
Persons 2 5 19 
3 6 4 {| 14 
ا‎ 
C.a 19 12 Wet 


62 PSYCHOMETRIKA 
It is now possible to compute the DI matrix. 
Dl, = 3Ş les — ep. — C.a e]. 
" 3(2) 


Dia = 509 — 21 = 19-89] = ЦИ — 21 — 19 4- gs] 
= i[22] = 22/6. 

DL, = &{8(7) — 21 — 12 + 38] = ¿[21 — 21 — 12 + 38] 
= i[26] = 26/6. 

Dha = WB) — 21 — 7 +38] = {18 — 21 — 7 + 38] 

` = 4[28] = 28/6. 


The complete set of DI's is obtained by similar computations. 


Jobs 
a 
1 2 3 27 DL, 
i-i 


——— € 
1|22/6 26/6 28/6 38/3 
Persons 2 | 28/6 23/6 25/6 38/3 


3 | 26/6 27/6 23/6 38/3 


— 


In this problem the three highest DI's сап be 
placements made. Man one would be placed on job number three, man two 
on job one, and man three on job two; this would result. in the maximum 
sum €i; + €» + cx; = 15. 

Notice what the counselor would do if m 
reason, be placed on job three. The counselor would not place man on 
job one as indicated by his highest productivity but would pl 
two where his second highest DI is located, DI = 
assignment, the counselor would continue to fill the 
of the disposition index. The result would be an assignment, that has the 
second highest possible value е, + ¢,, + Can = 13. 

The next example is selected. to demonstr: 
not give a maximum assignment sum if only 
Consider the productivity array shown below, 


selected and the indicated 


an one could not, for some valid 


e on 
ace him in job 
26/6. After making this 


jobs according to values 


ate when the procedure will 
one DI matrix is computed. 


нир 


aa" ق و‎ 
e RA 

аньг эана, 

: 


JOE H. WARD, JR. 63 
Jobs 
i 2 B G 
1114 0| 5 
Persons 211 7 14 
314 7 7118 
с. 16 18 13 | 37 = с 
The DI matrix is: 
Jobs 
3 
1 2 з > DIL: 
eae i=l 
1 | 29/6 26/6 19/6 37/3 
Persons 2 20/6 26/6 28/6 | 37/3 
3 | 25/6 22/6 27/6 37/3 
МЫ meneses aa 
3 
Y DI. | 37/8 37/3 37/3 | 37 = 0.. А 
iel 
= 29/6, О = 28/6, and 


The three highest values of DL, are DI; = 
-e the same job, if man one is placed 


Diss = 27/6, Since DL; and DI, involv 
on job one, and man two is placed on job three, then it will be necessary to 
place man three on job two. Thi ment will result in a sum which is 
Hob optimum, c, + cs + €i: = 14. However, if after placing the first man 
on job one, a new DI matrix is computed, an optimal sum will result. 
Jobs 
3 
2 3 2 DL; 
ї=? 
2 | 14/2 13/2 27/2 


Persons Е 
3 | 13/2 14/2 27/2 
сы ee are 


27/2 27/2 | 27 = C. 


ould be placed on job 
in the maximum sum 


3 
x DI 


4-2 
lear that mà 
This wou 


9 " 
nd the new DI array it is € nue ^ 
ү and man three on job three- ешь 
ы ps + б = 15. 

ow consider a much larger 


m 
ent of three different kinds of Рё 


i ү hich involves assign- 
ment roblem which involvé g 
к 2 nt kinds of jobs. The 


ople to five differe 


64 PSYCHOMETRIKA 


following array presents, rather than productivities, values which might 
represent the cost of having a person type in a particular type job. The 
matrix is bordered by the frequencies of men available and jobs to be filled, 
as well as by row and column totals. 


Job Types 
Persons 
1 2 3 4 5 Available c,. 

1 57 60 55 54 62 40 13940 
гане з 52 50 59 51 80 12890 
Types 

3 58 63 61 56 64 120 14550 
Job Quota 10 20 30 80 100 240 


C.a | 18480 14120 13520 13600 14240 | 3,334,800 = c.. 


A solution based upon such a cost matrix requires a minimization rather 
than а maximization process. Consequently, it will be necessary to select 
the smallest values of the DI matrix. From the marginal totals it is then 
possible to compute the DI matrix. 

3,821,060 3,321,140 3,320,540 3,320,220 3,321,500 

DI zum 3,321,150 3,320,270 3,320,390 3,322,470 3,319,910 
7 

3,320,690 3,321,250 3,321,370 3,320,090 3,321,370 


Starting with the smallest value and placing the personnel in ascending 
order of DI, the following minimum sum assignment is obtained: 


Job Types 
Persons 
Карар 2 3 d 5 Available 
1 10 30 40 
Person 
Types 80 80 
3 | 10 10 80 20 120 


Job Quota 10 20 30 80 100 240 


'The sum associated with this assignment is 


JOE H. WARD, JR. 65 


10(60) + 3053) + 80(51) + 10(58) + 10(63) + 80(56) + 20(64) = 18,300. 


This ex: ides i t t ting t ) 
s example prov Н 1 р їп е n i 
pie pro ides an optimal sum without recomputing h atrix. 


Other Possible Disposition Indexes 


=... а d to consider the variances associated with the expected 
with ise 0 more information. about the distribution of sums associated 
be €—Ó— ;le placement decision. The variances can be easily computed 

a ле methods and might be incorporated into a useful disposition 


index. 


REFERENCES 


1 " " ` А bai 
Ш Brogden, H. Е. An approach to the problem of differential prediction. Psychometrika, 


" 1046, 11, 189-154. 
wyer, P. S. Solution of the personnel classification problem with 

ior ean Psychometrika, 1954, 19, 11-26. 

[4) D EE: S. The detailed method of optimal regi 

a дор а R. L. The problem of classification ©! 

al Votaw, D. F., Jr. Methods of solving some per 

my APA I, гы 
Votaw, D. F., Jr. and Dailey, JA d Assignment of personnel to j 
52-24, Air Training Command, Human Resources Research 


Force Base, August, 1952. 


hi б 
A nerin received 5/2/57 
sed manuscript received 7/29/57 


the method of optimal 


ons. Psychometrika, 1957, 22, 43-52. 
f personnel. Psychometrika, 1950, 15, 


sonnel-classification problems. Ps ycho- 


obs. Research Bulletin 
Center. Lackland Air 


PSYCHOMETRIKA—VOL, 23, No. 1 
March, 1958 


A RETEST METHOD OF STUDYING PARTIAL KNOWLEDGE 
AND OTHER FACTORS INFLUENCING ITEM RESPONSE* 


Vera T. BRowNLESS AND JOHN А. Kearst 


AUSTRALIAN COUNCIL FOR EDUCATIONAL RESEARCH 


ction for guessing and other 
tuation is described and an 
ware this method of 
of the practical and 
the introduction. 


A method of studying the problem of corre 
problems associated with behavior in the test si 
illustrative example presented. As far as the writers are 2 
approach is novel but, at the same time, it covers many 
theoretical points raised by other writers аз reviewed in 


К Awareness of some of the problems involved in tests which are presented 
in multiple choice form has existed since the early days of testing. One of 
these problems is based on the fact that the test items can be answered 
Correctly by a person with no knowledge in the field being tested. By purely 
random selection from the alternatives presented in each question, such à 
person may obtain a nonzero score on the test. An individual may obtain 
any score from all correct to none correct, although results for a large group 
of such persons are expected to yield a group mean which is equal to (total 
number of questions)/n, where n is the number of choices in each question. 
P Previous workers attacked this problem in various Ways. Many, recog- 
nizing that guessing goes on to a greater or lesser degree whatever the instruc- 
tions, have recommended some form of correction for guessing. In apposition 
to the idea of making some form of correction, à number of people, in partic- 
uar Holzinger [4] and Gulliksen [3], have noted that, 
answer all questions, the correction factor makes no 


provided all students 

difference in the rank 
8 of the students. Stanley [7] suggested that althou Visi оуб 
erived from the correction when tl 


igh no benefit is 
he number of omits varies 1 m 
чаш to another, the students’ attitudes to the testing situation may be 
Improve, d. Й 
1 guessit 


T. doubtful that any over-al А 
Sa ability of the test. It is doubtful that all : v 
me number of alternatives; in fac e possibl Lear caen 
Students can eliminate some of the choices and are one ine ud н 
ewer choices, This problem was considered by Horst [5, : cnt 
nus that allows for elimination of some € ора Го: d 
owever, as Davis [1] points out, although 

d during discussions W 

ueensland, Australia. 
ly announced. 


ig correction factor improves the 
11 students are guessing from the 
le that the more able 


ith Dr. Frederic 


* Р 
The authors wish to acknowledge helP receive 
iversity 


M. Lor 
` “ord and Professor S. S. Wilks. 
Tha Present а or S. 8. WIGS Department, Unive 
v dim арш аршу 1 


of 
retful 


68 PSYCHOMETRIKA 


knowledge it does not make allowance for wrong answers which are based on 
misinformation. Davis [1] suggests that when a correction formula is used 
it leads to overcorrection if an examinee has misconceptions, undercorrection 
if he has partial information, and that these two influences tend to cancel out. 

One difficulty in discussing guessing is to find a suitable definition of 
guessing. In this work the authors are using the one given by Granich "The 
tendency to answer questions which are unrecognized either wholly or in 
part, when an answer can not be deduced with certainty from such information 
as the student possesses" ([2], p. 155). Here no assumption has been made 
that an n-choice question actually presents n choices to the student. A 


student with some knowledge may be able to eliminate some choices and 
thus narrow the field to n — 1, n — 2, +++ , or even 2 choices. 


Method of Investigation 


To obtain the empirical data for this method it is necessary to administer 
the same test to a group of subjects on two occasions. The time between 
administrations should preferably be short, and no warning should be given 
to the subjects that they are going to be retested. If the responses of the 
subjects to a particular item are examined on the two occasions they will be 
found to fall into one and only one of the ten categories listed in Table 1. 
The number of subjects in each category can readily be obtained and these 
numbers pooled for all items. For example, 7,, denotes the number of times 
any item was marked correctly at both administrations by any subject. 


Analysis of the Data 

Detailed observation and 

the tests would probably sug 
produce a given response categ 
will be made, not bee 


questioning of subjects while they are taking 
gest a large number of factors operating to 
ory. For the present, rather simple 
ause they are thought to cover all or eve 
of cases, but to facilitate the description of this method of 
possibility of testing these assumptions on the same dat 
overlooked and will be referred to again. It should be noted 
method of analysis suggested here will not only be useful in investigating 
the problem of correction for guessing but might well provide an objective 
method for examining certain factors thought to influence test; performance. 
A simple set of assumptions is given below. 
1. At the first administration, all res 
correctly, guessed, or “known” 
2. At the second administration, 
correctly, guessed, “known” 
memory. 
3. No person who knew the correct answer at the first adminis- 
tration will guess at the second. 


assumptions 
n the majority 
approach. The 
a should not be 
that the general 


ponses are either known 
incorrectly. 


all responses are either known 
Incorrectly, or repeated from 


VERA T. BROWNLESS AND JOHN А. KEATS 69 


TABLEI 


Possible Response Categories 


n 
Number of 


Type of Response 
cases in the 


Category toan item on two 


occasions category 

шышы == ЕС‏ ےو 
right x right m‏ 1 
right x wrong Ts‏ 2 
wrong x right Tou‏ 3 
wrong x same Wrong 1‏ 4 

5 wrong x different wrong T eq" 

6 omit x right p 
7 omit x wrong To 
8 omit x omit Too 
9 right x omit T6 
10 wrong x omit To 


چ 


response between adminis- 


4. No person will learn an incorrect 

trations. | | 

. The probability that a person who guesses will = ви i md 

18 regarded as unknown but constant for the persons an ^ = es abr 

areration in the sense that an average figure is required. а "A bn 

divisions of items or people or both can be examined pe : n ко 
ata are available and the corre ng average proba 


spondi 3 р iss 
8toups compared. The problem is to estimate this average probability. 


N otation 
1/k = the probability of success ei she correct answer Р 
= = the number of occasions subject 


both administrations. 


70 


PSYCHOMETRIKA 


the number of occasions subjects guess at the first adminis- 
tration and know the answer at the second. 

the number of occasions subjects guess at the same item at 
both administrations. 

the number of occasions subjects guess at the first adminis- 
tration and repeat the same response from memory at the 
second. 

the number of occasions subjects “know” the same incorrect 
answer at both administrations. 

the number of occasions subjects “know” an incorrect answer 


at the first administration and know the correct answer at the 
second. 


Using this notation as well as that of Table 1 with the assumptions made, 


it follows that Т,„ , the number of occasions subjects gave the correct answer 
on the first occasion and an incorrect answer on the second occasion will 
equal the product of u, 1/k, and (k — 1)/k, when the last term is the prob- 
ability of guessing a wrong answer the second time. Thus, 


(1) 


= (k — Du. 


Ma eS, 


In a similar way the following four equations can be derived. 


ТЕ 


п (k — 1)! : — 
Ta, MH wyi E Du a y. 


ld (k = Du " (k - um "TT 


0 


T _ (k — D(k — Du 
inde к : 


From (2) and (5), k and u ean be estimated. 


(6) 


(7) 


Although the remaining constants cannot be estim 
it is clear that the difference Т, — Т, is related to th 
during and between the testings, and the difference T 
the extent of fixation on a particular wrong ге 


d oos 
k= Eo + 2. 


и = ET. == uus ST? 
k=l Фа" 


ated from the data, 
e amount of learning 
ww — T,» is related to 
sponse. If the material in the 


| 


VERA T. BROWNLESS AND JOHN A. KEATS 71 


test is of an unfamiliar nature it might be safe to assume that there is no 
prior knowledge and thus that x and y are both zero. In this case s, ^ and m 


can be obtained explicitly with the following result: 


(8) pe urat) 
(9) b= (Ter — Т.) — 1), 

and 

ao m = (Low = Т) — D. 


be obtained by considering patterns of re- 


A second estimate of Ё сап 
a response to an item at either or both 


sponses involving the omission of 
testings. 


Notation 
z — the number of oceasions a person omits at the first administration 


and knows the answer at the second. 
a = the number of occasions a person omit 


and guesses at the second. 
b = the number of occasions à person omits at both administrations. 
в = the number of occasions à person guesses at the first administra- 
tion and omits at the second administration. 
which is in line with the assumptions 
ns who know the answer at the first 
1 administration. 
iven, it is possible to 


s at the first administration 


A further assumption 18 required 
already listed. It is assumed that perso 
Administration will not omit a response at the second 

_ With this assumption and the notation already 5 
derive five more equations in the way illustrated above. 


(11) Ty, =2+ e 
(12) Tow = (Е = Da/k. 
(13) аф 
(14) т, = с. 
Tu qu, (k — Def: 

The solutions for the unknown quantities are as follows: 
(16) ur 

k= To + 

(17) mif s 
as b= Тоо * 


-1 
N 


PSYCHOMETRIKA 


ToslT eo + T.) 
а = Eeo T T), 


(19) Т. 
TOS 
(20) dab x Tas : 


With some tests and under certain conditions of administration, the 
total number of times a person omits an item may be insufficient to give 
reliable estimates of the constants. In particular, the estimate of k might be 
based on a relatively small number of cases. This may not be unsatisfaetory 
if this estimate is being caleulated only as a check on the value obtained by 
the method which does not consider omitted items, but it must be noted 
that in the ease of two-choice items this is the only method of estimating /. 

Since the primary interest of this type of investigation is the estimation 
of k, it is important to examine the nature of this estimate. For this purpose 
consider a person who is guessing between n alternatives for a number of 
items. Let b = k, , where k, is the estimate of k obtained from (6). 


(21) = 2 S au Tus 


This procedure can be repeated for further groups of items provided 
that within each group the subject is guessing from the same number of 
alternatives. In practice it is not possible to isolate these groups. 'The method 
outlined above yields an average of the following kind: 


(22) jaga ED 
It may be difficult to justify this method of aver: 
suggest themselves in theory. In practice this is the 


given by the present method, and no more satisfac 
been devised for estimating k. 


aging over others that 
type of average that is 
tory method has so far 


An Illustrative Example 
To illustrate the type of results obtained b 
analyzed for 78 cases from two schools. Each s 
administrations of each of two tests with a p 
administrations. The tests used were a mixed 
ability test (A.C.E.R. Intermediate D) and a nonverbal test involving 
problems with line figures (Jenkins Test). The frequency of all possible pairs 
of responses to a given item was tallied, but as there were very few occasions 
on which an item was omitted, response categories involving an omission are 
not presented. In Table 2 appear the frequencies for the two tests, 
The value of k obtained by applying (22) to these data is 3.6 for Inter- 
mediate D and 3.5 for Jenkins Nonverbal, Thus, although these tests both 
involved five-choice items, the effective number of choices appears to be 


y this approach, data were 
ubject had been given two 
eriod of one week between 
verbal and number general 


| VERA T. BROWNLESS AND JOHN А. KEATS 73 


TABLE 2 
Summed Frequencies in Response Categories 
for Illustrative Example 
DI ЖТ, ZTwr ZT hw ET wa Total 
Intermediate D. 1330 183 285 472 293 2563 
Nonverbal 3405 407 939 565 598 5914 
r ee ee 


about three and one-half as an average over persons and items. A point of 
contrast between the two tests is suggested by the relatively high value of 
Tov and low value of 7. for Intermediate D as contrasted with Jenkins 
Nonverbal. This result suggests that the familiar verbal and number items 
involved more misconceptions and recall of wrong responses than the un- 
ipd items involving classification of line Зер hia latter items, 
owever, showe reater amount of learning between triats. 

It is اا‎ m these results are presented to illustrate the аи 

апа поб to prove anything about the tests. The number of peso es ni 
and the time between administrations is longer than would idea ү к 

a However, the results obtained do not appear unreasonable and indicate 


that further studies of this kind would be worthwhile 


REFERENCES 
i ing. hol. 

[1] Davis, F. B. Item analysis in relation to educational and psychological testing. Psyc! 
Bull., 1952, 49, 97-121. ing i 
[2] Granich, L. A technique for experimentation on guessing Т 
" Psychol., 1931, 22, 145-156. " 
ia Gulliksen, H. Theory of mental tests. N 
È Holzinger, К. J. On scoring multiple response 
Horst, A. P. The chance element in 
i 6, 209-211. ice tes 

@ Horst, A. P. The difficulty of а multiple choice 
29-232, Ed 

[7] Stanley, J. C, “Psychological” correction for chance. J. ezp 


n objective tests. J. educ. 


‚ York: Wiley, 1950. 
5% ы J. y^ Psychol., 1924, 15, 0 
the multiple choice item. J. gen. Psychol., 1932, 


t item. J. educ. Psychol., 1933, 24, 


uc., 1954, 22, 297-298. 


| и Qnuscript received 8/17 [56 
evised manuscript received 4/15/57 


PSYCHOMETRIKA—VOL, 23, NO. 1 
March, 1958 


THE MEASUREMENT OF FUNCTION FLUCTUATION 


R. F. GARSIDE 
UNIVERSITY OF DURHAM, ENGLAND. 


ction fluctuation is suggested and an 
dicated. The proposed method is com- 
r suggested methods of 


A method of measuring fun 
appropriate test of significance is in 
pared with bi-factor analysis and with some othe: 


measuring function fluctuation. 


The literature on function fluctuations has recently been summarized 


by Anderson [1]. He considers the various methods which have been proposed 
and concludes that those suggested by Thouless [12] and Finney [6] not only 
give similar results but are the best simple methods. Mahmoud ([9], p. 131), 
however, has stated that Thouless’s index of function fluctuation gives 
results which “seem far too high." Moreover, Finney has intimated [4] that 
his paper, which Anderson [1] refers to, was а “hurriedly prepared private 


document” not intended for published discussion. 
rediction is limited by the amount of 


The accuracy of psychological pt , 
fluctuation in the mental function under investigation. The measurement of 
Such fluctuation is therefore important. Yet it appears that there is no general 
agreement as to how function fluctuation is best measured—this is the purpose 


of the present paper. 


Definition of Function Fluctuation 

ple are tested on two occasions, that the 
and that the true g scores obtained on 
ardized so that the variance of Ф equals 
we mean that the changes in true g 


t constant for all testees. If (6 = 9) 


ig, pone that a group of peo 
adh measure a common factor, 0, 
that occasion, g, and g» , are stand: 
Scor E 9 . By fluctuation in function, 
is es between occasions (02 — Ф) 21е 10 
Constant, then the function is stable. 
ti Mahmoud ([9], p. 130) refers to such | 
18 admitted that person instability 1s probab pium 
Uctuation, because unequal fluctuations 1n function is im „= 
Чебайопз as such. Nevertheless the term epa ate this concept. 
“sed since it has ШЕШШУ: been used in the past to indicat 


function stability as person stability. 
y a better phrase than function 
ied rather than 


and of Function F luctuation 


as the ratio of stable 


Coefficient of Function Stability 
1 to the same tests 


function stabi 
ariance M 


D А lity, rs : 
Sf efine the coefficient of А fact or genera 
hee in the general factor to V i 


75 


76 PSYCHOMETRIKA 
given on a single occasion. The coefficient of function fluctuation, Rer , 
may be defined as 1 — Rpg . Thus, 
Fa У, 
a) Res = 1 — Rep = у EU a 


where V, = the variance of s, the stable part of gı and gy . 
Now suppose that, for each person tested, there is a series of true g 


scores, each g score being obtained on a different occasion. Then, for a person 
i 


2) 9 = $ F d;,, 
where 0: = g score of person 2 on occasion P, 
8; = stable score of person 3, 


di» = score of person + associated with occasion p. 


On each occasion, a set of g scores will be obtained. Wi 


e may postulate 
that these sets of g scores are 


all parallel to each other. Then, if s; is defined as 


2 gi» 
(3) pm 


Gulliksen ([7], pp. 28-31) has shown that 
(4) V, nV 


where V, — variance of Stable scores, 


V, = variance of the set of g scores obtai 
ta = correlation between any two such 
"Thus, to consider two such sets (or occasions), 


ned on any one occasion, 
Sets of g scores, 


V. у, 
(5) Tan = у = F = Res 


Neither Ry; nor Тә, бап be negative. If Rrs = 0, Rpp 
fluctuation is at a maximum. It should be noted that g, and g, refer to true 
g scores. Thus, E», and Rys are independent of errors of measurement and, 
therefore, they indicate the extent to which function fluctuation, as such, 
limits the accuracy of psychological prediction. ' 


In order to measure r,,,, , and accordingly Rps and Rrr , the plan of 
using a number of different, not parallel tests, will be adopted. At least two 
tests must be given on one occasion and at least two other tests on a subsequent 
occasion. Hence the number of tests must be four or more. No test is given 
twice, but the same testees take all the tests. This plan differs from that of 
Thouless [12], who suggests giving two tests twice. It also differs from Dunlap’s 
[3] plan of using four parallel tests given on two occasions, 


An essential part of the proposed plan is that the tests must be chosen 


= 1, and function 


t 


R. F. GARSIDE 77 


so that, when all the tests are given to a separate group of testees on one 
occasion, they measure one general factor and no group factors. Whether the 
intercorrelations so obtained are consistent with this requirement may be 
ascertained by earrying out a faetor analysis or caleulating tetrad differences 
and applying the appropriate tests of significance. An exact test of the 
significance of tetrad differences has been given by Wishart [14]. In our 
design, the tests given on the first occasion must not be parallel to those 
subsequently given, unless all the tests are parallel to each other. Their means, 
standard deviations, reliability coefficients and specific factor loadings may 
all differ from test to test. 

Strictly speaking, the tests given at the same occasion should be ad- 


ministered simultaneously. This may be achieved by combining the tests 


into a composite test, each subtest providing items in rotation. It should be 
gement is sound only if the tests 


remembered, however, that such an arran 
are power rather than speed tests. If speed is an important factor, the tests 
must be given separately. 

To simplify the derivation, 
used. The derivation may easily b 
If A, B, C, and D represent true scores of t 
at the first occasion and C and D at the secon 
factor, g, is the sole source of correlation betwe' 


consider the case when only four tests are 
e extended to cover five or more tests. 
he tests and if A and B are obtained 
d testing then, since the general 
en the tests, 


(6) Tap = Tana Bos 
and 
(7) Тер = Тс Dos * 


But g; is the sole source of correlation between 92 and A or В. Therefore 


(8) rac = Tasse Cos › 
(9) fap = T Ag vios! Dos 1 
(10) Tae = T Boton: Cos ? 
and 

(11) rap = "Boto: Drs * 


Substituting (5) in (8), (9); (10), and (11) and multiplying, 


Tac Anl acl aD, 
= Res Tue eo Dos 
Substituting (6) and (7) in (12); 

(13) a oL adag. 
Res =~ фор 


78 PSYCHOMETRIKA 


Multiplying numerator and denominator of (13) by the variances of A, B, 
C, and D, ; 


a CacCapCncCnp 
lan AER U 
where C indicates covariance. 
If it is assumed that errors of measurement are uncorrelated with one 


another or with true scores, then the covarianee between the true scores of 


any two tests equals the covariance between the obtained scores. Thus (14) 
becomes 


ш OOS (тать mu) 
(15) Res = (СС)? = (Кшт)? , 


where a, b, c, and d refer to obtained scores. 


Should т.п, OF тата be negative, it merely means that the test 
scores of one or more tests have been inverted. Equation (15) is similar in 


form to Yule's attenuation formula (Spearman [11], p. 294). The coefficient 
of function fluctuation, Rr» , is given by 


(CasGea)? > (Оа)? (т, а)? = (r Fadl befi yt 
16 R pp = - : : = а! ac! ad! he! bd. a 
Uu cd» (C, C.) (ar 


Tf five tests are used a similar derivation gives 


(17) 


where tests 1 and 2 are given on the first occasion and tests 3, 4, and 5 on a 


subsequent occasion. There is no difficulty in deriving R rs forStx or more tests. 


Mean of Rrs and of Rrr 
The question now arises as to whether Res and Ry, , the mean values 
obtained from samples, provide unbiased estimates of Res and Tig aine 
population parameters. Wishart ([14], pp. 184-185) has shown that, whé N 
is large, both C,,C,,C,.C,, and СС. approach the corresponding population 


parameters. Thus Frs and Rpp provide satisfactory estimates of Rps an 
firr, respectively, when N is large. 


Significance of Rrr 


If the function tested fluctuates between testings, then the intercorre- 
lations between tests will reflect not only a general factor, but also group 
factors associated with occasions. This was pointed out by Dunlap ([3] 
p. 448). Thus the significance of Rpp may be tested by simply ascertaining 
the significance of the appropriate tetrad differences in the usual way (Wishart 


R. F. GARSIDE 79 


[14]. When four tests are given, these differences are rar, — TacToa and 
Tabled — Tad re + 

It is therefore unnecessary to derive the standard error of Rpp or of Res . 
If, however, the standard error of Res is required, it may easily be derived 
by taking logarithmie differentials (Kelley [8], p. 526) and by using Wishart’s 
[13] moments. These are reported by Kelley ([S], p. 555). 


Bi-factor Analysis 

It has been suggested that a bi-factor analysis carried out on tests given 
on different occasions would indicate the extent of function fluctuation. 
Such an analysis has, in fact, been carried out by Ferguson [5]. He gave three 
parallel tests to the same group of testees, one test being given on each of 
three occasions. He then calculated the fifteen correlations between the 
halves of each test and carried out a bi-factor analysis. He concluded that, 
“It is not unlikely that both the correlation of errors and functional varia- 
bility are exerting a positive influence on the size of the group factors, and 
since no method of determining the relative importance of these two influences 
is at the moment apparent, it is only possible to describe these factors as 
factors of temporal contiguity.” But when a bi-factor analysis is carried 
Out on correlations among tests designed and administered as described in 
this paper, then the size of the group factor loadings will be affected only by 


unetion fluctuation. . . | 
For the sake of simplicity, again consider the case of four tests anly; 
even though this number of tests would be, of course, ШЕШ to ш 
Out a satisfactory factor analysis. It is assumed that when с our les 8 
are given at the same time, they measure a general factor чы по psp 
factors, Thus, when the tests are given in pairs on two differen um en 
and a bi-factor analysis is carried out, two group factors associate 
Occasions and a general factor will be obtained. ible to carry out a 
. . Note that it is sometimes supposed that it not possi ү > nit 
bi-factor analysis with two group factors only, unless there ien dis 
5 s, cluded which involves neither group am E Rem analysis 
чүзү (2l, p. 56) has indicated а epa de cadi p n one or the other 
тау be carried out when every test has а factor loading 0 


of the two group factors. . 

ccording to our definition of the coefficien 
equals the ratio of the proportion of test variance е 
Actor to the proportion attributable to both genera eo 
oe errors are ignored, then this ratio will be consta 
Since they B 5 › al factor. Thus, 


t of function stability, it 
attributable to the general 
and group factors. If 
t for all tests, 


Measure the same gener : 

(18) 2 2 MD m 
= h =” Е و‎ a 
Res -€— Gm ре. gi d 


80 PSYCHOMETRIKA 


where ga , Js , Je , and ga are the general factor loadings of the four tests, Do 
and p, are the first group factor loadings, and д. and q4 the second group 
factor loadings. Therefore 


2. 22020 
Res = в 2)/ | 9:19:04 27ү 5 2v 
(19) (9 + pgs + Pg + a2 (ga +. a2) 
З —— ————— 
(gigs + gaps + gipa + Papi) (giga + 048 + Gide + qiga) 
But, from (18), 


(20) 0РФь = 0р. , 

and therefore, 

(21) gapo + gpa = WaGuPaPe < 
Similarly 

(22) 9:44 + Gide = 20.044.44 - 


Substituting (21) and (22) in (19), 
2.22 2 
(93 Rt. = 9«99-ga - 
) PS (gage + Peps) (9-94 + а-аа) 


If scores a, b, c, and d are obtained as indicated previously, and if sampling 
errors are again ignored, then 


(24) Tab = 0.9, + PaPe ; 
(25) Tea = geda + 94а, 
(26) Tac = Jafe , 
(27) tas д, 
(28) Tre = gue, 
and 
(29) Toa = Joga . 
Therefore, substituting (24) to (29) in (23), 

_ батылы) 
(30) Зн = (пага) c 


Equations (30) and (15) аге identical. Therefore, apart from possible differ- 
ences arising from sampling errors, the method proposed in a preceding 
section and bi-factor analysis provide equal estimates of the coefficients of 
function stability and fluctuation. It can be shown, in a similar manne! 


В а ва ава. 


R. F. GARSIDE 81 


that this is also true when more than four tests are used. But the proposed 
method is simpler to carry out. 


Comparison with other Coefficients 


Paulsen [10] suggested correcting the retest reliability coefficient for 
attenuation due to test error using the split-half reliability coefficient as 
the correction factor. The coefficient obtained by this procedure will measure 
function stability, but Paulsen called it the coefficient of “trait variability.” 
This coefficient is essentially similar to the proposed coefficient, Rrs . The 
proposed coefficient, however, would seem to be superior in that it utilizes - 
more information from the same amount of testing and does not involve the 
split-half reliability, which does not always provide a satisfactory measure 


of test error. 


Thouless [12] suggests using two tests twice in order to test for and 


measure function fluctuation. In our notation tests а and c would be the 
same test administered at different times and so would be tests b and d. 
Thouless seems to mean the same as we do by funetion fluctuation and, in 


fact, points out that if 
(81) Таса — Tad! be > 0, 
then function fluctuation exists. This tetrad difference is the same as one of 
the pair used in testing for function fluctuation. But Thouless considers that 
this purpose may be more simply achieved by calculating EA Ш 
this correlation is positive, then function fluctuation exists. ' 

To obtain his index of function fluctuation, Thouless divides 7;а-е) (0-a) 
by the mean of Tos and Fea < Accordingly 
(32 5. 2^ (a—c) (b-a) 

l Dre тата 
Where Zp» is Thouless's index of function fluctuation. ges ep ie 
that the standard deviations of a and ¢ and of b and d are equal. He thus 
9 tains 
ть E Teg Tad — Toc š 
tee = a pr V n) — ты) E. 
# Irr cannot be directly compared with Rer ; p the ed p rien 
giver qui, eer teeta having o E sY s iis however to derive 
lven twice 2 longer hold. tis poss! ] М 
а coefficient E! M o Ror ee Thouless's сре шоа E cw 
) ? , H 
d (7), (9), and (10) will still apply to the data dines Ы 
!Yed in а manner similar to that of “FF ` j 

= (atea) 


(33) 


(34) 


1А 
РЕ 


82 PSYCHOMETRIKA 


Apart from the factor V — rJ — т) , Ire only differs from Rpp in 
that Ir is a function of arithmetic means of pairs of correlation coefficients 
whereas A2, is a function of their geometric means. But test a is the same as 


test c, and test b is the same as test d. Therefore, within the limits of sampling 
error, 


(35) Tay = Tea 

and 

(36) Toa = The 

Thus 

(37) Ire V = red — та) = Rep « 


In practice, the factor V (1 — r,.)(1 — rra) will be less than unity and 
will therefore make Ipp greater than Rpp ; it appears to be an unnecessary 
complication. Moreover, as Mahmoud ((9], p. 131) remarks, Thouless's 
index gives results which seem too high. 

Mahmoud [9] considers the case where several tests are given and then 
repeated in the same or parallel form. He derives a coefficient of person 
stability, which may be calculated from any number of tests. In the case of 


two tests only (i.e., four applications) his coefficient reduces to ([9], p. 129, 
equation xvii) 


38 о lad +7, 

em Бар = + Tea 

where a is parallel to c and b is parallel to d. Rs» cannot be compared directly 
with Xps because Mahmoud uses parallel tests. But a coefficient Ry» may 


be derived, in the same way as Rz,» , which will be comparable to Rsp, 


PC ats Madri)? 
m Pes = Gar 
The correlations ra and r,e , and also r,, and ты; , Will again be approximately 
equal. Therefore Rs will give similar results to those of Rsp , within the 
limits of sampling errors. The proposed coefficient Rpg ; however, seems to 
provide a more direct indication of the extent to which prediction is limited 
by function fluctuation. Moreover, by avoiding the use of parallel tests, Rrs 
utilizes more information from the same amount of testing than does Rsp 
It is interesting that giving the same tests twice, or using parallel tests, seems 
to be a disadvantage in measuring function fluctuation, 

Mahmoud ([9], p. 129) states that Rs» “measures the extent to which 
the relative abilities of a given set of persons, assessed on two or more separate 
days, have remained the same, in spite of the interval between the two appli- 
cations or (particularly if the interval is short) in spite of the variations in the 


conditions that obtained." In order, therefore, to obtain a coefficient of trait 


ыалы A-e———'O[———— анаан” Л” лб el 


R. F. GARSIDE 83 


variability, Rey , Mahmoud subtracts Rs» , not from unity, but from his 
coefficient of internal consistency. This coefficient depends upon errors of 
measurement, and therefore so does Rry . The proposed coefficient, Rys , 
is independent of such errors and for our purpose, therefore, would seem to 
be more appropriate than Rr, . It is true that variations in conditions may 
tend to reduce Pps , but this effect may be minimized by careful test ad- 
ministration. 


Example 


For an example, some of Mahmoud's data ([9], p. 121, Table II) will be 
used: ra = .713, rac = .881, raa = .037, Toe = -559, та .670, and та = 788 
(N = 87). From these data, Thouless's coefficient Zer = .878. Without the 


factor V(I — rJ — ты), Гев would equal .174. It is evident that this 


factor has a considerable effect, making Ire much greater than Rip , which 
equals ,176. 

Mahmoud's Rsp = .826, and the coefficient Res = .824. The results 
obtained from Rsp and Js are very similar. The proposed coefficients 
Hs and Ree , however, include more information from a given amount of 
testing than does Rs» , and their derivation is more direct than that of Rsp . 
Moreover, the proposed coefficients do not entail giving the same tests 
twice or the use of parallel tests. 

REFERENCES 
[1] Anderson, C. C. Some simple methods of testing for function fluctuation. Brit. J. 
Psychol., 1955, 46, 1-12. 


2] Burt, C. Group factor analysis. Brit. J. statist. Psychol., 1950, 3, 40-75. 


3] Dunlap, J. W. Comparable tests and reliability. J. educ. Psychol., 1933, 24, 442-453. 


[4] Editorial note. Brit. J. Psychol, 1955, 46, a . 
[5] Ferguson, С. A. A bi-factor analysis of reliability 
31, 172-182. | ‚ 
6] Finney E A note on the measurement of performance e MO. DI 
to the National Foundation for Educational m Ош a ales, 
[7] Gulliksen, Н, Theory of mental tests. New ушы) Hey ЛЧ. а, 
8] Kelley, T, L, Fundamentals of statistics. Cambridge: + ада РЕЛӘ, 
[9] Mahmoud, A. F. Test reliability in terms of factor theory. ET 
reps 15. ili 218-219. 
Fur Paulsen, © yn coefficient of trait variability. i o т г perra 1910, 3, 
a Spearman, C. Correlation caleulated from faulty data. 


coefficients. Brit. J. Psychol., 1940, 


271-295. А ion. Brit. J. Psychol, 1936, 
[12] Thouless, n. H. Test unreliability and function fluctuation. Brit. J. Psy Я 
26, 325-343. moment distribution in samples for & normal 


[13] Wishart, J. The generalized product 8, 20A, 32-52. 
multivariate population. Biometrika, i oe of two factors. Brit. J. Peyehol., 1928, 
Wishart, J. Sampli cep | 
i pling erro 
Я 19, 180-187, 
Tanuscript received 11/19/56 
evised manuscript received 4/1/07 


—— —— ——L—— -— ss m 


PSYCHOMETRIKA—VOL. 23, NO. 1 
MARCH, 1958 


PREDETERMINATION OF TEST WEIGHTS 


PAuL J. HOFFMAN 


THE STATE COLLEGE OF WASHINGTON” 


Given test A with known variance and reliability, one frequently wishes 
to construct a second test, B, such that the relative weights of the two tests for 
determining a composite score will be of some predetermined magnitude. 
Where test B can be experimentally pretested, item analysis procedures 
designed to control the standard deviation and reliability of the test can be 
applied ([1], pp. 375-380). If item parameters cannot be obtained in advance, 
the usual practice is to construct test B without regard to the problem of 
weighting and to apply some transformation to the scores after the test is 
administered and the test parameters determined. 

In many applications, and particularly in the classroom, the person 
responsible for evaluation is not prepared to engage 1n what seems to him to 
be high-powered statistical manipulations. What is wanted isa way of arriving 
at a composite for each individual member of his class by simply totaling 
the various part scores. For this reason, an attempt is often made to pre- 
determine weights by controlling the number of items in each test. It has 
been shown ([1], pp. 336-341) that the number of items in a test ei not - 
necessary determinant of test weight, a fact which might pice К е ^ 
this possibility as а solution. It is not known, however, un A e 
number of items is likely to affect test weights. Since ipee ч ч к 
vell continue to justify its use in the lack of strong Mere т * ак: + m , 
16 becomes important to determine the conditions under m bé A wi 
by controlling the number of items in a test may be succes у employed, 


and the conditions under which it may not. Sees 
vn Se at erin omi i Qe e р ыйл 
«self not clearly defined. There are а йе riterion ([3], рр. 


the contributions of two or more tests in the absence of rice whether a 
211-213; [4], pp. 88-90) and some suggestions for deter g 


А thod implies 
&lven test contributes more than or less than another [5]. Each method imp. 


“Now at University of Oregon. 
85 


86 PSYCHOMETRIKA 


a somewhat unique definition of weight. It is not our purpose to re-examine 
the problem of the meaning of test weights. Instead, we consider two defini- 


tions of test weight and develop the methods for their predetermination on 
the basis of length of test. 


Weighting by Standard Deviations 


It is often assumed that the effective weight of a test in relation to 
another is determined by the ratio of the standard deviations of the two 
tests. Thus, if test X has a standard deviation c, and test Y a standard 
deviation c, , the weight of Y in relation to X is given by IV, = c,/c. 

Now let us assume X is a test of unit length, and that Y is a test of 


increased length, such that, in deviation scores, y = z, + £a + +++ 4 tr. 
Then 
a. 2 (tı + ay + oe of 2)? 
g, = 
N 
(1) k k k : 
= »» бз T 25 Упса, › (i Æ )- 


i=l j=l 


If it is assumed that the components of Y are parallel forms, one may 
substitute as follows: 


a Ж. 
zy = 07 j Tos “Тыш, 


во that from (1), 


ау = ko? + М — Dni. 


But 
2 2 
2 _ oy _ koz + k(k — 1), о? 
We 2 = E: =k-+k(k — 1)... 
Therefore, 
(2) W, = Vk + k(k — Пу = 


From (2) it is seen that the effective weight of a test varies directly 


as а function of test length and reliability. If the reliability of the unit test 
is 1.00, 

W,= VE kk-—1 
(3) = VEF F =k $ 

W, =k. 


If the reliability of the unit test is zero, 
(4) W, = Vk. 


PAUL J. HOFFMAN 87 


Considering (3) and (4), the inequality 
VESW, <k 
makes the dependence of test weight upon length obvious. 

Our main concern, however, is that of finding a value for k that will 
result in a predetermined weight W, . To solve (2) for k, first square both 
sides: 

И = k + k(k — Wr. 
=k+ hr. — Riss x 
Arranging terms in quadratic form, 
rb + (1 — rk — W3 = 0; 
‚_ =O = r) z V = r) + ae, 


(6) 
b 2r. 


Since a negative radical leads to k < 0, only one root is meaningful: 


7 V = ro + an Wi- (1 = ra) 
i pe 2r. г 


From (7), опе can estimate the relative length of a test that is required in 


order to yield a given weight with respect to the unit test. M. 
Example: Assume that the cumulated scores for an individual to the 


end of the semester comprise a total of 100 test items. The reliability of the 
cumulation is .70. It is desired to construct a final examination which will 
equal twice the weight of the other tests. In this example, 

= .70, W; = 4.00, 


fis — 
‚ _ VG + GIC — 30, M1129 — 30, 
f (C10) 
k — 2.19. =. 
Then the number of items necessary on the final examination is given by 
100k = (100) (2.19) = 219 items. an be rearranged to yield an 


(It should be noted that the terms of (6) c 
expression for т„„ in terms of W, and k. Thus, 


„0-9 = 


requi ly the 
Thi i form of a test requires on 
an words area D x i viation of the shortened 


у ае 
Standard deviation of the initial test, the standard 
orm, and their relative length.) 


88 PSYCHOMETRIKA 


Figure 1 is a nomograph from which Ё can be quickly determined for any 
given rz, and any desired W, . 

Tt should be emphasized that the derivation of W, depends upon one 
important assumption: that the components of Y are parallel forms of test X. 
For the development of aptitude tests this may impose no significant practical 


ТО .20 .30 40 .50 .60 то .80 оо 1.00 


fx 
FIGURE 1 
Computing diagram for estimating the length of a test, Y, such that W, = : 
„ = the desired weight of the test Y, рове 
Tzs = reliability of test X, < 


ratio of estimated length of Y to length of X. 


PAUL J. HOFFMAN 89 


limitation, but for achievement testing the situation is different. Achievement 
testing at different stages of learning yields scores on individuals who may 
differ in their rate of learning. In addition, course content is not necessarily 
highly interdependent among its various stages. For these reasons it seems 
reasonable to doubt the comparability of two achievement tests separated 
by a period of learning, unless some empirical evidence can be offered to 
show that such a procedure makes little practical difference. We shall return 
to the empirical question in a later portion of the paper. 


Weighting by True Scores 


One major difficulty in assuming that the weight of a test is a function 
of its standard deviation is that tests of low reliability will necessarily have 
small standard deviations. Thus, scores of an unreliable test may be multiplied 
by a constant so as to increase the test’s standard deviation in relation to a 
Second more reliable test. The composite score thus becomes contingent 
upon the more unreliable test. This difficulty has been acknowledged, (2), 
pp. 385-396) but the proposals for overcoming it have been varied. A solution 
that meets this objection, and one which seems to make rational sense is 
to define test weight in terms of the ratio of the standard deviations of true 


Scores, Thus, 
(8) W, = с/с... 


r does test length affect test weight defined in this way? 


In what m: d 
p composed of tests of unit length, 


Let us regard the true score on test Y as 
X. In deviation scores, 

k 
(9) „= > bi + 


Again assuming comparable forms among the components of Y, it 
- d 


follows that the ¢,, will be equal. Then (9) becomes 
l, = ki, 

and 

ey gp = Kot. + 


с; 


Solving for k, 
pes; ende: 
Substituting from (8), 
(10) hem Wa 
test weight is 
the length of th 


4 defined as the ratio of the 
De (10) state e test by the proportion k 


Si 5 $ 
Mas of true scores, increasing 


90 PSYCHOMETRIKA 


increases its weight by also. Thus, if one wishes to write a test that will 
count twice as much аз а given test, he simply writes twice the number of items. 
This coincides with the intuitively justified practices of many teachers who 
have no knowledge of test theory. The practice ean now be scen to be statis- 
tically justified, when the assumption of parallel forms is met. 

То obtain evidence concerning the accuracy with which estimates of W, 
can be made, scores were obtained from midsemester and final examinations 
in Introductory Psychology for a group of 54 college freshmen. Both examina- 
tions were multiple choice, the final consisting of 105 items. Only the first 
30 items of the midsemester examination were used. Successive portions of 
the final examination were scored, yielding totals for each individual for 
the first 30, 60, 75, 90, and 105 items. The successive scores are thus not 
independent, a fact which detracts from the meaningfulness of the comparisons 
but which does not invalidate them. These results are plotted in Figures 
3, and 4. Figure 2 compares obtained values, W, = ¢,/¢. , with values of W, 
estimated from (2). In this case, X is the 30-item midsemester examination, 
and Y is the final exam of varying length. Figure 3 differs from Figure 2 
only in that test X now consists of the first 30 items of the final examination, 


9 


e 


o——o Predicted 
e-—-9 Obtoined 


30 45 60 75 
Test Length 
FIGURE 2 
Predicted and obtained weights of test У. Predictions made on the basis of 30-1 idterm 
examination. Test Y consists of accumulations of items of the final аав m beyond 
the first thirty. 


0—0 Predicted 
@--® Obtained 2 


22 


зо 45 60 75 
Test Length 
FIGURE 3 
Predicted and obtained weights of test Y. Predictions made on the basis of first 30 items 
| examination. Test Y consists of accumulations of ite: inati 
of fina beyond the first thirty. items of the final examination 


PAUL J. HOFFMAN 91 


and test Y is composed of the successive portions of this examination, not 
including the first 30 items. It can be assumed that the difference reds 
these two figures is due to the fact that the assumption of comparable forms is 
more nearly met for the latter situation than for the former. 


9——-9 Test X -Ist 30 items of Final 
3.5 o--—o TestX - Ist 30 items of Midterm 
—— Predicted 


Test Length 


FIGURE 4 


ed weights of test Y. Test weight defined as ratio of true scores. 


Predicted and obtain 


ted and obtained values of W, when defined as 


a ratio of true scores, according to (8). In this case, the predicted values are 
exactly proportional to test length; hence the solid diagonal line represents 
these predictions, The actual obtained values for Ш , were in this case deter- 
mined by noting that o}, = 7,0; and of, = Ters Therefore, 


wil tes, 


у, = 2272 


e, A Re 


1 the item dat 


Figure 4 shows the predic 


a, using Kuder-Richardson 
Figures 2 and 3, so too in 
al examination as test X 
ate than those based 
g the assumption 


The reliabilities were estimated fron € ; 
‘ormula 20. As was apparent in û comparison © 
“gure 4, the use of the first 30 items of the fin 
“sults in predictions which appear (0 be more aceut лт 
Оп the midsemester examination. The necessity for satisiy1, 


Of par, : dics 

"Mu forms seems again to be iem of parallel forms, necessary 
Should be emphasized that the det :ved in this paper is one 

0 satisf 6 tions derived ш › і 
p y the : ; of the equa = m" ual. We 
Which demande quy thet UB variances and sar poe det bud Lis 

need not say 4 | «lations are perfect, OT pee 
€ бя : :tercorrelations à Sa ET f differential 
high, To ы NUR reduce the entire que af the 
penting to a triviality, except as 


sát it may Jead to the external criterion. 
labilit rediction 0 : 
У of the composite or to the p 


92 PSYCHOMETRIKA 


REFERENCES 


[1] Gulliksen, Н. Theory of mental tests. New York: Wiley, 1950. 

[2] Horst, P. The prediction of personal adjustment. SSRC Bulletin No. 48, 1941. 

[3] Kelley, T. L. Interpretation of educalional measurements. New York: World Book, 
1927. à 

[4] Thurstone, L. L. The reliability and validity of tests. Ann. Arbor, Michigan: Edwards 
Bros., 1931. ' 

[5] Wilks, S. S. Weighting systems for linear functions of correlated variables when there 
is no dependent variable. Psychometrika, 1938, 3, 23-40. 


Manuscript received 7/23/56 
Revised manuscript received 4/18/57 


N 


` Tables are to be numbered with 


+ Figures should be drawn only by an ех 


- Formula; е1 
s should be numbered at the punctuation of formulas, 


RULES FOR PREPARATION OF MANUSCRIPTS FOR 
PSYCHOMETRIKA 


Send manuseripts to the Managing Editor: 


LYLE V. JONES 
Psychometric Laboratory 
University of North Carolina 
Chapel Hill, North Carolina 


Submit three typewritten copies of the manuscript. For original copy use heavy white 


typewriter paper, size 814 x 11. Double-space the lines, leave ample space around 


formulas, and allow wide margins for editorial work. 
Accompanying the manuscript should be three copies of an Abstract of no more than 
100 words, outlining the contents of the paper. 
Tables should be submitted with the manuscript in four copies. Prepare o 


riginal сору 


of tables on electric typewriter, in a form suitable for photographie reproduction. The 


remaining three copies need not be prepared on an electric typewriter, but should 


adhere to the prescribed form. 
Arabic numerals and referred to in the text by number? 
e.g., Table 2. The heading of the table should be centered. The word “Table,” on the 
first line of the heading, should appear in capital letters, e.g, TABLE. 2. The title, 
double-spaced below the table number, should have initial letters of principal words 


capitalized. Titles should be short; if two lines are required they should be single- 


spaced. " 
Double horizontal lines should separate the heading from the stubhead, a single line 
should appear between the stubhead and the body of the table, and a single line should 
appear at the bottom of the table. Footnotes referring to any part of the table should 
be single-spaced immediately below the table. Tables appearing in Psychometrika, 
1956, 21, 362-363 show a variety of examples in good form. 

For the electrically typed copy of tables, heavy white paper should be used, and no 


erasures should appear. Correcte pasted over errors using rubber 


cement. On this copy, closely related tables should be prepared or mounted on the 
° 1 сору will fit the journal page after reduction. If 


same sheet in such a way that final e | 1 
this results in a sheet size exceeding 814 x11 inches, the use of mailing tubes is recom- 
mended. 


d entries may be 


pert draftsman, about three times the size at 
Jain white paper or tracing cloth in black 


d be on 
s number, e.g., Fig. 3. Each figure 


which they wi E 
y will appear. They sho 
India ink, Th in the text b; 

. They should be referred to in the De 
caption, eluding the figure number and a succinct title, should be typed on a separate 
sheet of paper, No such identification should appear on the front of the figure. On the 
D ite lightly his name and the figure 


margin of the back of the figure the author should wri : 
numb, x hic reproductions 
er. In additi: 4 iginal copy of figures, three photographic rep 
ddition to the orig! ens Sn he BORÊ 


or rough sk ld 
reae wu E rgin with Arabic numerals in parentheses. 
which ordinarily 


hould be legible, and unfamiliar 
they should be 


ery complicated 


eft ma 


Careful attention should be given to 

are to be regarded as parts of sentences. 
Symbols avoided if possible. Where they ar 
defined in the margin, as “upper case Greek letter gam! 
Notations, a list for the use of the printer should be submitted. 


Formulas 8 4 
e used for the first time, 
mma." For v 


93 


/94 


7. 


10. 


11. 


PSYCHOMETRIKA 


Footnotes to the text should be reduced to a minimum. Formulas in footnotes should 
be avoided. Footnotes should be indieated by the following symbols: *(asterisk), 
+(dagger), i(double dagger), §(section mark), ||(parallels), (paragraph mark). Foot- 
notes should be typed at the bottom of the page of text to which they refer. 

References should be segregated at the end of the article. The heading should be “Refer- 
ences” not “Bibliography,” and should be capitalized and centered. The references 
in such a list should be arranged in alphabetical order according to author's name, 
and numbered with Arabic numerals in brackets, In the text references and pages 


should be referred to by number: [2], (2, 6, 10], [cf. 3], [e.g., 4, 6], ([2], p. 36), (cf. [2], 
[5], p. 20, eq. 13). 


With only minor exceptions, the forms of citation adopted by the Board of Editors 


- of The American Psychological Association are used in Psychometrika. (See American 


Psychological Association, Council of Editors. Publication manual of the American 


Psychological Association, 1957 revision. Washington, D. C.: American Psychological 
Association, 1957.) The form for a journal reference is as follows: 


[1] Gulliksen, H. and Tucker, L. R. A mechanical model illustrating the scatter dia- 
` gram with oblique test vectors. Psychometrika, 1951, 16, 233-238. 
The form for a book reference is as follows: 


- [6] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 
. А separate sheet giving the title of the article and the author's name and professional 


connection should be included with the manuscript. The author's name or professional 
connection should not appear on the manuscript. There should be no reference which 
would identify the author of the manuscript, e.g., “In previous work [14], the present 
writer has shown that . . .” Since all such statements must be removed before the 


article is sent to the editors, it will facilitate work in the editorial office if these pre- 
cautions are observed. 


The author is urged to give careful attention to grammatical construction, spelling, 
and punctuation 


The journal will provide 100 free offprints of each article. Additional offprints will be 
available in accordance with the following schedule: 


Add. 
2 pp. 4 pp. 8 pp. 12 pp. 16 pp. 2 рр. 
100 copies $4.00 58.00 $12.00 516.00 $20.00 $2.00 


Each additional 100 82.00 $4.00 $ 6.00 $ 8.00 $10.00 $1.00 


A blank page counts as one page. Covers: $12.00 first hundred, $5.00 each additional | 
hundred. 


PSYCHOMETRIKA—VOL, 23, No. 2 
JUNE, 1958 


RELIABILITY FOR THE LAW OF COMPARATIVE JUDGMENT* 


HAROLD GULLIKSEN 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 
AND 
Јонх W. TUKEY 


PRINCETON UNIVERSITY 


In studies using the method of paired comparisons and the law of com- 
parative judgment, it is desirable to determine the reliability of the scales 
Which are obtained. For a given set of data one might like to know the extent 
hs Which the law of comparative judgment js successful in accounting for the 

otal variance in the data. : 
Mosteller [13] has outlined a chi square test of the agreement between 
€ fitted proportions p*, and the observed proportions, p; such а test labels 
: discrepaney betwean, observation and theory as either "significant" or 
b Yonsignificant” but does not indicate whether the variance accounted for 
Y the theory is large or small in relation to the total variance in the data. 
his property of significance tests is well known and has been clearly 


Stat H 5 . 
2 Cochran [3] in his discussion of the chi square test: : 
ће Dowe ing disagreement between theory 
r of t an underlying disag D 
and data is dots Iis by the size of the sample. With а n: samp 
xj alternative hypothesis which departs violently from the S 1 hypo pa 
ve Still have a small probability of yielding а significant va "n Rynathesia 
ery large sample, small and unimportant departures from the null пур 


Ea Almost certain to be detected. he deb ‘not 
the san i ? test will show that the са a are 
4 iple is small then the x University, the Office of 


" 


Naya) This ге $ ; t by Princeton à 
al R Search was jointly supported in рат y eton "al Science Foundation 
search under С ба onr-1858(15), and Ше айдо, Reproduction in 


er 
Whole Erant, NSF G i 
i -642 t b; ; ent. 
Ог in H 2, and in par Ya of the United States р ИРА Ж 


the q Zh 
he devela ts are due to Ledyard ucker 
Pment presented here. 


96 PSYCHOMETRIKA 


significantly different from" quite a wide range of very different theories, 
while if the sample is large, the x^ test will show that the data are significantly 
different from those expected on a given theory even though the difference 
may be so very slight as to be negligible or unimportant on other criteria. 
Fisher [6] gives а good illustration of this point in his analysis of Weldon's 
data on dice throws. If we test the theory that a throw of 5 or 6 has a prob- 
ability of 1/3, then chi square for Weldon's data is very large, with p of 
.0001. However, a very slight change in the theory—from a probability of 
.3333 to a probability of .3377—gives a quite reasonable chi square with a 
p value of .3 or .4. А 

In order to proceed appropriately in any scientific investigation it 15 
likely to be necessary to answer two different questions. 


i. Is it reasonable to say that random variation accounts for the difference 
between theory and data? 


їй. How large is this difference relative to the variation that is accounted 
for by the theory? 


In studying the applicability of the law of comparative judgment, 
variance-component and analysis of variance techniques can provide ар- 
propriate answers to these questions by methods outlined below and there 


applied to two sets of data on handwriting specimens and to Mosteller's 
[13] baseball data. 


The Data of the Example 


The handwriting specimens were chosen from the Ayres [1] handwriting 
scale. This scale consists of a series of handwriting specimens of nine different 
scale levels, numbered from 10 (the lowest) to 90 (the highest), Each of 
these scale values is represented by three specimens, a “vertical” style (9), 
a normal slant (b), and an extreme slant (c). Thus the scale consists of 27 
different handwriting specimens. In conventional use, a handwriting specimen 
to be scaled is judged to be like one of the scale specimens or to fall between 
two of them. Thus, specimens can be scaled 10 to 90. The extremely bad oT 
good ones might be either below 10 or above 90, respectively. Nine of these 
handwriting specimens were chosen for the present experiment: 50а, 50b, 506, 
70a, 70b, 70c, 80a, 80b, and 80c. These specimens are shown in Figure 
The 36 possible pairs for these nine specimens were arranged in a booklet; 
with instructions for the judge to pick the better member of each pair. 

It is interesting to note that one can easily develop a discussion in a class 
in measurement to indicate that there are numerous criteria on which it İS 
possible to judge these handwriting specimens; the class will rather readily 
reach the conclusion that any set of judgments would be meaningless, highly 
unreliable, and unduplieatable unless one defined in great verbal detail 
exactly what characteristic was to be judged, instead of simply using the term 


98 PSYCHOMETRIKA 


"better handwriting." In the late 1930's this schedule was given without 
preliminary discussion of the problem to 100 students at the University of 
Chicago, and in the late '40's it was given, again without preliminary dis- 
cussion, to 100 students at Princeton University. The data, p, the observed 
proportions, are shown in Table 1. The agreement between these two sets of 
judgments for 100 people taken in different institutions about ten years 
apart is rather striking. 

For any pair of stimuli, û and j, the probability of a choice, p, is ap- 
proximately given by the integral of the normal curve which corresponds to 
the difference of scale values interpreted as a normal deviate, fitted according 
to Thurstone [14, 15] or Mosteller [13]. 

The two sets of scale values obtained from utilizing the law of com- 
parative judgment as stated by Thurstone [14, 15] are shown in Table 2. 
In both of these scales, stimulus 50a, the poorest one, has been chosen as 
having a scale value of zero. The fitted proportions, p*, computed from 
these scale values are given in Table 3. The scale values for the total group, 
given in Table 2, are found by summing the frequencies for the two groups 
and then proceeding to scale as for the single groups. 

When Mosteller’s [13] chi square test for goodness of fit is applied to 
these data one finds (see Table 5, xp) a chi square of about 74 for the Chicago 
data, 76 for the Princeton data, and 127 for the two groups combined. The 
corresponding p values are each less than -0001, the chi square value at the 
-01 level being only 48. Thus, the conclusion reached would be that the data 
are not fully accounted for by the law of compar 
it is interesting and meaningful to know whether th 
variation which is not accounted for shoul 
or 2 per cent or as much as 75 per cent. Fi 
a validity coefficient of .5 for predicting so 
useful test, even though it is 


ative judgment. However, 
e fraction of the systematic 
d be regarded as approximately 1 
or example, if an aptitude test has 
me criterion, it is considered a very 
also true that 75 per cent of the variance in the 
criterion is not accounted for by the test. Under such circumstances it would 
doubtless be true that the criterion contains a significant nonrandom com- 
ponent that is different from anything represented by the test, Analysis of 
variance and variance-component analysis procedures will give information 
on the percentage of the variance which is accounted for and on the per- 
centage which remains to be accounted for after the law of comparative 
judgment has been utilized, and will thus give coefficients which are analogous 
to reliabilities. For various illustrations of analysis of components of variance 
see, for example, Mood [12], Bennett and Franklin ([2], Chapter Т), Davies’ 
[4] discussion of “expectation of mean square” beginning in Chapter 4; 
Duncan [5], especially Chapters 23 and 24, or Tippett’s [16] discussion of 
substantive variances in Chapters 6 and 7. 


Framework of the Analysis 


Since we are dealing with proportions, the sampling variance is a func- 


(ее 655 


€Cg'l чот 


«лт 96670 zów 0 Мо 


n dnoa9 
оо тозор 
proz бт O69°T BLET 2б2'1 gogo waco Loto 000'0 uo322utJd 
160°2 ет GGL'T GLUT LG9°0 012'0 0000 о2тэзчэ 
203 ао). ао? vol 206 906 B04 un jo 
БЕРЕНЕ ЕЛЕ ЕШШ amag 

*cusupoodG Vuliili&apuej шоу SƏNTEA ITV 


“pimp uci»2u 


= fep oe: 


*mivp uoqasutid = d (ттер осоо = э, 
CN N MCN" eee i = 
EHRE P © EEE $ = 

EEE i = ees id ж 

бо pe go 1 эш жо юш 3i эш 

© ш = SES i w 

E d = CEE 1 = 

ES 206 ce 190 P 206 

aa ғ ш = LES 3: « 

= E Я vo 
а: FOS 201 FOL 205 406 por uod 203 402 40! POL 205 905 20S „мер jo — suouioods 
Suaziosds SüuiiIapuuH somos 2иүатлариен 

cuosroods Supipzapung 
MEC TU ae rE— 
€ SINT , T FTEVL 
1——— T د‎ “ а “= 


99 


100 PSYCHOMETRIKA 


tion of the true proportion as well as of the sample size, No? — «(1 — 7). 
If the analysis is conducted in terms of an angular transform of each n 
portion, then the binomial sampling variance is a function primarily of N, 
and not of the true proportion. The angular transform of the data is defined 
on different scales by different authors. The simplest scale for our purposes 
is that used by Hald [9] in his table, where @ = 2 arcsin м/р; the arc is 
expressed in radians. 

The variance of 0 is 1/N approximately, for proportions not too near 
1 or 0. If Np and N(1 — p) both exceed 4 or 5, the approximation is quite 
good. Even more extreme cases may be analyzed by the use of the averaged 
angular transformation, Freeman and Tukey [8], which will be satisfactory 
for Np, N(1 — p) > 1. In the other common version, tabled by Fisher and 
Yates [7], 6» = arcsin Vp, the аге is expressed in degrees. 

The variance of 6, is approximately 821/N for proportions not too close 
to 1 or 0. Thus if p = .50, 0 = 4/2 = 1.5708, while 0» = 45.00. In general, 


45.00 
Ш е 
If tables of 0, are used, then, in order to fit into the pattern of Table 4, the 
resulting sums of squares should be divided by 821. 

The convenience of an analysis in terms of 0 values lies in the fact that 
for pure binomial variation the variance of any 0 is substantially equal to 
the reciprocal of the number of observations on which the p is based. This 
property of the angular transformation allows the definition of modified 
chi squares, such as the one used by Mosteller, which do not require de- 
nominators. When necessary, we shall distinguish these modified chi squares 
as angular chi squares. 

For each ordered pair of stimuli, ¿ and j, we have an observed angle 0 
corresponding to the observed p's of Table 1, and a fitted angle 6* derived 
from the fitted scale and corresponding to the fitted p*'s of Table 3. Because 
of the symmetry of the situation the mean of the complete set of p's, or that 
of the p*’s, is .50. Correspondingly, the mean of any complete set of 0's an 
the mean of any complete set of 6*’s equals 1.5708. 

Using angles, the analysis of variance is 
definitions, where the arc is measured in radi: 


given in terms of the following 


ans: 
0 = 2 arcsin Vp (observed values); 
0* = 2 arcsin Vp* (fitted values); 


6 = 1.5708 = 2 arcsin V.5. 


If all the stimuli are identical and are 
proportion of judgments 7 greater than j 
We treat the observed angles 0 as 


judged to be identical, then the 
would be .5 in every case. 
if they were a sum of three types of 


—є— qr —— HH ————— = € —— О 9 وڪ د‎ 


HAROLD GULLIKSEN AND JOHN W. TUKEY 101 


contributions. This treatment is approximate in two ways. First, as Mosteller 
([13], p. 213) was careful to point out in connection with his chi square, the 
fitting used is a least square fit on the normal scale but not on the angular 
scale. Consequently, residuals on the angular seale will not be as small as 
those resulting from a fitting procedure tailored to the angular scale. As a 
consequence, our estimated "reliability" coefficients will be somewhat smaller, 
just as Mosteller's chi squares are somewhat larger, than those obtainable 
from more closely tailored fits. Second, the imperfect linearity of the relation 
of angles to normal deviates implies that the true scale difference for any 
pair compared is, when measured in angles, only approximately a difference. 
For the purpose of defining variance components and reliabilities this latter 
effect, should not be quantitatively important. We shall use these approxi- 
mations freely, usually without further ado. Let us return to the three types 
of contributions associated with a single comparison (as of two specimens of 
handwriting) and contributing to the observed angle. 

One contribution is approximately the difference between the true scale 
values for the two stimuli, say 8; — $; - These s values may be thought of as 
drawn from a population with variance c; . Hence ihe values in pe cells, 
S: — s; , are regarded as drawn from а population with variance 265. — 

Another is a deviations component, designated d, due to the deviations 
of the data from the linear sealing model used. These d values are treated 
as if they were drawn from a population with variance c; and average "e 
(They are, of course, fixed by the selection of stimuli and constitute a set о 


numbers defined for û = j, with 
4 = dp, У, = ЫЬ 04, md дайн 


ini 


= 0 foreach i) 


Due to the fact that we are dealing with values ешш бр 
Portions, we have a binomial error component, say b. eed Re же Тош 
from а population with variance 05 s Since we are Wor : M сете 
transforms it is not exactly true that E(b;;) is zero, but es е кыйыл 
Approximation, It should be noted that it has been assu pa inet 
are drawn from the same population 50 that in this € n Wen 
18 made for stable individual differences 11 p which no stable in- 

ir à N ош ی‎ 7 ue jaar complex analysis.) 
s differences exist before agate ios ad pae ма 
* кщн en i these three 
: эшо танап of die роршайоп € a each of 
quantities may be thought of as drawn, 28 follows: | 
) + di 4d 9 ААТ 
res А 
с со of the observed 6 
1 


$a = (6—8 


a 
d Population variances of these t 
*» and оз, When the data are апа 


102 PSYCHOMETRIKA 


from their mean, designated 8, is easily separated into two parts, one & 
linear component in agreement with the law of comparative judgment, the 
other a residual component, as follows: 
(06;; — 8) = (0% — 8) + (6;; — 0%). 
total linear residual 


Correspondingly we have the three sums of squares. 


Total Sr =} 2 (0;; — D, 
Linear EI 2 (0% — 8y, 
Residual Sp =} 2, (0:; — 0%)”. 


It may be noted that s, d, and b all affect the linear component (and also the 
total), while the residual is not affected by s, but only by d and b. This separa- 
tion can now be used as the basis for an analysis of variance. 

It should be noted that we are implicitly using the approximation 


0% = — Б бы) + 8. 
U hai hei 
The actual 6*, , as in Mosteller's paper, is obtained as the angular trans- 
form of p*, found from an empirieal law of comparative judgment scale 
fitted via the normal transform. 
Because of the nature of the fitting process, and because of the slightly 


nonlinear relation between angles and normal deviates, the deviations of the 
observations from their means have been se 


parated into two parts which are 
not formally “orthogonal.” 


There is no necessity for 
È (O — 04)(0% — 8) 
ini 


to vanish. Consequently the two expressions for th 


р e sum of squares associated 
with the fit according to the law of compar 


‘ative judgment, 
42D (6 — 0 = в, 
and 
2 2:05 — a + 2; (Os — 05 = Sr — 8, , 


need not be precisely the same. So long as these give substantially the same 
answer, we may use either S; or Sr — Sp in assessing a "reliability" without 
serious error. Should they differ widely, reconsideration of the fitting woul 
be in order. 


The linear, residual, and total mean squares, together with the number 
of judges, N, and the number of stimuli, k, may be used to give estimates 0 
the variances as follows: 


HAROLD GULLIKSEN AND JOHN 
SEN AND JOHN W. TUKEY 
103 
Total mean square Т = = + с 0 
1 T= = D est (20% + 93) 
wi E] d b, ; 


Residual mean square D = E eee 2 5 
C= к 


Binomi: 5 1 
al mean square у = est б ; 
1 
Sr _ OPES 
EI est (ko? + оз + 05). 


Linear mean square L= 
= 


Tt should I i 
»e noted, as pointed out above, that we also have another possible 


value for i 
for the linear mean square given by 


k(T — D) +D Sr = S» 
" n E xis 73 = est (koi + вз + o. 
e may als ; 
ay also define an associated set of chi squares as follows: 
x: = NSr, xi = №, 


"Tha baie Хт-р N(Sr — S»); Xb = №8. 

cui pis a for the associated analyses are summarized in Table 4. 

апа P qu : " the observed values, p, and fitted values, p", the values 

БЕШТ c ound. These are used to compute Sr, So; and S; , the sums 

T. Those , rom these we get the mean square values designated T, D, and 
ase are used to give the estimates of varianee components and “те- 


liabilities." 

wie A em of the procedure indicated in Table 4 to the data of 
for the tois the results indicated in Table 5. In Table 5 the values obtained 
by P, and pie group are indicated by C, the values for the Princeton group 
bito: nensis ne values obtained by pooling the numbers of judgments for the 
Mosteller [1 are indicated by T. The data on baseball teams presented by 

Ther 13] are indicated by В. 

of the lin esults show consistency in the variance components. Three estimates 
(Chicago), 0 component are available in the handwriting experiment, 0.3521 
similarly ‚ 0.2868 (Princeton), and 0.3115 (combined). Three estimates are 
(Chios available of a "deviations from scalability" component, 0.0166 
Chicago), 0.0171 (Princeton), and 0.0176 (com 


linear c pined). In comparison with the 
well es mponent the deviations components are 8 
ong themselves. T his fact suggests that we have systematic and 


mall and agree unusually 
Consist: 
ent, though small, deviations from the law of comparative judgment. 


Variance Ratios 
many different 
f validity and 


gets of variance ratios 


In : Є 
dealing with psychological tests A 5 
ypes О reliability coefficients 


ha 
ve be 
e e à 
n used, giving various ty 


PSYCHOMETRIKA 


104 


Tenba oiu SINT BBIN AUR DIDY SULL 


+*лүтё yous 103 seBpnf jo Iəqmu = N 
*рәлейшоо әле uoru^ Jo sated TT “<шәзү jo Jequmu = 4 Qu x 
= a “ee a 
зпетрел T)G^l = 6° puts Sat g = ө 
IR pue x 


(вәптел posit) ad MIS GENES ‘sanTea sTeos JO S395 OA? 103 


(сәптел poarasqo) ap urs DIR Z = 


Tus jo ѕәптел = 
г әтәс лвәит1 9: = r gt 2 = s 09 
g =f E + “92 
Py 91925 лвәцүт e 
€ шолу сиоттетлә@ Bo Bw Soa 
Hu. x D ELLE E a4 
zd с 
Зиттйшос TeTwouTg а a e 
4uouoduoo ooueT.vA .uouoduoo әоивтлел шотуптлол ore Pore Soz 
до 330993 лој тоаш^с Jo oa.mog S; x LX "I г C 
(a> Der pm 


stsfkTeuy squouoduo) әоивтлтд 


aTeos двәитт Ха 


— F103 рәзипоодв 


Чер» the LA 
М = D +0 = = 
E E (2730007). доц "renprsou 
ас. d, 2797 
Cs S)N = = (sTe»s rwoutT) 
е ТИ. 
E * г + 0 T-J jueuSpnf ƏAT} 
г pent 
‘lo = i вдейшоо Jo мет 
гт 2 
E а p s = = Xt = Bn * 
SN = x wt ор + ове 2 = is = „(ө ө) 3 ? (T= N TIN: 
— ih ia OMM CM aa I: چچ ج ج جج‎ 
arenbs Tyo arenbs ивәш jo orenbs sarenbs шорәәл чотуетдел 
зетп®цү әптвл әЭздәлү uean Jo umg jo вәәл®ә1 30 eoinog 
anap TRA FS ааа 
eousTiuA JO StTsATouy 
P E Rats SSE 


©әлпрәзолд jo outiijno 


+ TIdvL 


HAROLD GULLIKSEN AND JOHN W. TUKEY 105 


each having somewhat different properties and serving somewhat different 
purposes. In general each coefficient is the ratio of a measure of “true variance" 
to а measure of "observed variance" which includes both "true variance 
and error variance." One reasonable interpretation for paired comparisons 
is to regard the linear component, 25? , as “true variance" and the other 
two components, c? + c; , as error variance, 50 that we may define a coefficient 


of linear consistency by 


T 2c; zOMLO. „ 
р, = от tata E Ды 


. . The factor 2 arises from the fact that cj was normalized in terms of 
individual stimuli, while oj and ов are normalized in terms of differences. 
That is, о? is the variance of the E different s values, while the variance of 
the K(k — 1) values s, — s; 15 25? , and the observed variance for the cell 
entries is 203 + оа + vi. 

If the linear sum of squares in taken as Sr — Sp, instead of S; , then we 
have another estimate for the coefficient of linear consistency, 


т-р = 20; 8 
пер EEL 
These coefficients, r, and 7, indicate the extent to which the linear 
model, as represented by the fitted values 0*, fits the observed cell entries, 
given by 6. For example, if the agreement is perfect, then Sp and D will 

k= T so that r, = 


equal zero, and S; will equal Sz, which means that 2L/ 
squares T, L, and D are all equal, 


т, = 1.00. If, on the other hand, the mean al ¢ 
then r, = r,, = 0.00. These coefficients т, and r,, are regarded as similar to 
Toe , the square of the correlation between observed and true values assuming 

be regarded as representing 


the linear model. Alternatively, т, and T. MAY à $ s 
the correlation between two sets of observed values provided their correlation 


is entirely accounted for by the true values, assuming a linear model. The 
ded as appropriate to the recomparison of 


Coefficients r, or r,, may be regar ro] С i 
а randomly selected pair of the nine handwriting specimens against a back- 
&round of seven other specimens covering the same range of merit and hence 
drawn from a population having the same c; as the specimens used in this 

ree specimens each of values 


experiment. For example, if another set of thi F 
50, 70, and 80 were imu a similar c; would be expected; if с? and c; also 


remained about the same, a similar degree of agreement between fitted and 
Observed values, ie., a similar coefficient of linear consistency, would be 
, Le, 2 


expected. | 

However, if all the handwriting specimens from 10 to 90 in E v 
Scale were used, опе would expect 2 larger о, » and if, as seems р а081 e, 
ol remeined about th the result would be a higher coefficient than that 
ADORE other hand, if one used 


found here using only values 50; 70, and 80. On the 


L^ 


PSYCHOMETRIKA 


106 


(т000' >) 
(10000">) 
(10000'») 
(100005) 


d 


get 


9° 


QT LS 
RO тле 
LU LLoc 


dy 


ё 


28 `0) 


(961672) 
(6229.2) 


(aese e 


aambs juo 
aemiuy 


[Sc 
2 GLo9*ze 
2 T99g"02 
4 9096 Сг 


тг 


ge 
ge 
ge 


e 


вәлътЬз 
Jo ung 


(sp) wopaaz; 
Јо вәэлЗәй 


(2) 


TenPISOH 


əqeəs avout Ty! 


uv 


чотузтлол. 
Jo amos 


ААА ААА 


aouepivA Jo sTSKivuy 


туш] JupTwog Jo uosrruduo) 


5 SYL 


107 


=N ‘Q= A '[C1) ләттәузоң. 


г 
Бу шолу vep тї=аәсюй = (8) 002 = N ‘6 = 4 ‘19449303 o^ osouL = (L) fOOT = N “6 = я 'vivp чоўәоцула = (4) {0071 = N '6 = 3 ‘uzup o3s»1u9 = (2) 
b * 
k 
Р 2619" <66L" tefl’ OGL (я) 
к (686° = Ta) (кв = 5) 
= 
z 4266" 016° 9696" GLa’ (1) 
^ pa 2596" 1666* 9646" (а) 
8 2996" <г16 2196" 8916" (9) 
i Done «60. Èa qu. ЭЭ DX ow Pi з о а 
E ? S CEE „7p jo aomog 
^A 
зәтұүтудюүтән P7193 
d 
Ы 
E 6640" ©сто°- (&) 
H 0500* oLto* (1 
3 ooto" ло” (a 
m ooto 9910* (9) 
a 
d 
z 
= » 'uoj4uiruA тетшош}а Po *lopou oTuo9 толу Suopy4UTAXT E *sonTwA oTuo9 1voul] „0p JO ao.mog 


втпопойшод o2Uw]19A p919213953 


108 PSYCHOMETRIKA 


only specimens 50, 60, and 70, a slightly smaller c? and (if oł remained about 
the same) a slightly lower reliability would be expected. 

It can be seen that even though Mosteller's chi square goodness of fit 
test, x5 , shows clearly that the handwriting data deviates significantly from 
a linear scale, nevertheless the scales show a satisfactory agreement with 
the linear model, about .95 for the case where the nine handwriting specimens 
were rated by 100 or 200 judges. Since only 2e; is considered to be true 
variance, the coefficients given by r, and r,, will be what are usually termed 
“conservative” estimates. A “dashing” estimate for reliability is obtained 


by regarding oj as part of the true variance rather than as part of the error 
variance. Thus we have 


1 
H а бл WET а 


рь = 3 = = т. 
Jui puer. T . 


This definition yields for the handwriting data reliabilities of .98 or .99. 
This coefficient represents the correlation between two sets of @ values for 
the same stimuli judged by another random sample of people from the same 
population. Coefficients computed from this formula are appropriate to the 
 recomparison of a randomly selected pair of the nine specimens against а 


background of seven other handwriting specimens drawn from a population 


having the same c? and also the same peculiarities that produced the devia- 
tions from linearity. One possibility i 


. А corresponding chi square is give 
degrees of freedom. These values of chi s 
large, indieating a negligible probabilit 
. random sampling from a population in 

" ` The coefficient r,, which is zero if the percentages of Table 1 are all 
random binomial deviations from .9, may be compared with Kendall's 
coefficient of agreement ([10], рр. 125ff.; [11], рр. 333ff.), which is unity only 
if all proportions are 1.0 or 0.0, i.e., if there is complete agreement among all 
judges in making each judgment. Kendall’s coefficient of agreement is de- 
termined directly from the experimental frequencies, without using any 
transforms such as the are sine. The data here presented cannot be regarded 
as showing such agreement among all judges. However, it clearly cannot 
be regarded as indicating only random judgments. 

We may compare these coefficients com 
with more conventional reliabilities obtaine 
with the Chicago scale values. The correlatio. 
in Table 2, т, is .989, which, it may be not 


n by xz = NS; with (k/2) (k — 1) 
quare (x7 in Table 5) are all extremely 
y that the data could have arisen by 
Which the proportions were all .5. 


putable for a single set of data 
d by comparing the Princeton 
n between the two sets of values 
ed, is similar in magnitude to 7; · 


HAROLD GULLIKSEN AND JOHN W. TUKEY 109 


If we 
make no allowa in di 
à з ance for changes in diserimi i i 
entire difference of scale values insted | pri A er ا‎ 
es 2 а а adjus а common mean 
mmon variance as error, then ps ims 


 =1- ZZE = ов 
е эгизи 
шу: 18 similar in magnitude to the estimates of p, . 

“йел irs се have been suggested. The coefficient r, indicates the 
able +5 uni e pe > are differentiated by the subjects. It seems reason- 
Ялы un M dee а conservative estimate of consistency for a single 
would be no repli y the law of comparative judgment. In such a case there 
reasonably b plication to indicate that c; might, from some points of view, 
give a. di e regarded as part of the true variance. The estimates r, and r,, 
rect measure of the agreement between the observed 0 and fitted 


0* ү 1 
alues of the arcsin МЇ. К 
he data on baseball 


The lines labelled B in Table 5 give for comparison t 
ting to note that despite the 


is only .73, while ть = .62. 
f the different teams, 


Mr by Mosteller [13]. It is interes 
This low cm chi square, the reliability, т, OT Tas ; 
Bane быр 8 iability is due apparently to the similarity o 
with din 5. опу .0439, which is less than the binomial variation of .0455 
prising th с, must be combined. Under these circumstances it is not sur- 

g that chi square is not significant, especially with V as low as 22. On 


the 
other hand, the data on handwriting has а smaller binomial variance 
pite the fact that the residual 


Ц 2 nd a much larger o? (about .3). Des 
equals m » а Ls slightly smaller than that for the baseball data, when N 
cannot be 95 200 with 28 degrees of freedom, this much smaller discrepancy 
ая regarded as due to chance. 
ummary, a variance-compon 


paired 3 
actual comparisons, This analysis gives e 
Scale values, c; , and the variance of observations due to deviations 


of 

‚з Pis ant the linear paired comparisons model, с3, which are compared 

these ds binomial sampling variance oa. А variety of coefficients based on 

Or not s variances is also presented. ]f one is interested in asking whether 

Of agree 1e subjects’ responses are purely random, then Kendall's coefficient 

in the eee or the 7, as presented here may be used. If one 1s interested 
extent to which the law of comparative judgment accounts for the 


ata 
; then r, or r,, would be the appropriate coefficient. 


s has been presented for 


ents analysi 
stimates of the variance of the 


REFEREN CES 
chool children. New 


Ш A 
Xni. - P. A scale for measuring the quality of handwriting of $ 
[2] Benne ussell Sage Foundation, 1912. er ‹ in 
Shan C. A. and Franklin, N. L. Statistical analysis in chemistry and the 
ТУ. New York: Wiley, 1954. 


chemical 


110 


13] 
14] 


[5 
[6] 


(7 
[8 
9 
[10] 
[11 


12] 
[13 


[14] 
[15] 
[16] 


PSYCHOMETRIKA 


Cochran, W. С. The x? test of goodness of fit. Ann. math. Statist., 1952, 23, 315-345. 
Davies, O. L. (Ed.) Design and analysis of industrial experiments. New York: Hafner, 
1954. 

Duncan, A. J. Quality conirol and industrial statistics. Chicago: Richard D. Irwin, 1952. 
Fisher, R. A. Statistical methods for research workers. (10th ed.) London: Oliver and 
Boyd, 1946. ^ Р 

Fisher, В. A. and Yates, Е. Statistical tables for biological, agricultural and medical 
research. New York: Hafner, 1953. 

Freeman, M. F. and Tukey, J. W. Transformations related to the angular and the square 
root. Ann. math. Statist., 1950, 21, 607-611. 

Hald, A. Statistical tables and formulas. New York: Wiley, 1952. 

Kendall, M. G. Rank correlation methods. London: Griffin, 1948. 

Kendall, M. С. and Babington-Smith, B. On the method of paired comparisons. 
Biometrika, 1940, 31, 324-345. 

Mood, A. M. Introduction to the theory of statistics. New York: MeGraw-Hill, 1950. 
Mosteller, F. Remarks on the method of paired comparisons. III. A test of significance 
for paired comparisons when equal standard deviations and equal correlations аге 
assumed, Psychometrika, 1951, 16, 207-218. 

Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol, 1927, 38, 308-389. 
Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 273-286. 
Tippett, L. H. C. The methods of statistics. (4th ed.) New York: Wiley, 1952. 


Manuscript received 7/1/57 


Revised manuscript received 9/25/57 


PSYCHOMETRIKA—VOL, 23, No. 2 
JUNE, 1958 


AN INTER-BATTERY METHOD OF FACTOR ANALYSIS* 


LEDYARD R TUCKER 


EDUCATIONAL TESTING SERVICE 
AND 
PRINCETON UNIVERSITY 


The inter-battery method of factor analysis was devised to provide 
information relevant to the stability of factors over different selections of 
tests. Two batteries of tests, postulated to depend on the same common factors, 

f individuals. Factors are 


but not parallel tests, are given to one sample o 1 1 
determined from the correlation of the tests in one battery with the tests in 
ly those that are common to the two 


the other battery. These factors are on. h 
batteries. No communality estimates are required. A statistical test is pro- 
vided for judging the minimum number of factors involved. Rotation of axes 
is carried out independently for the two batteries. A final step provides the 
correlation between factors determined by scores on the tests in the two 
batteries. The correlations between corresponding factors are taken as factor 


reliability coefficients. 
Factor analysis has been used for a number of years in the explorations 
for basic mental traits. Results from the numerous studies have indicated 
the existence of a number of these traits with several traits being relatively 
ишу established [11]. Sets of reference tests to represent 16 such abilities 
‘ave been prepared by special committees of psychologists [12, 13]. A universal 
Index for psychological factors has been proposed by Cattell [7]. There have 

een a number of criticisms of the factorial methods, however, on the grounds 
i unknown stability of results. Serious questions have been raised SU ee 
M е justification of factor analysis results and propriety of use of the дар 
MeNemar [21], for example, reported an empirical study of factorial stability 
11 which he conëluded that the first centroid factor loadings had ipo 
Erste approximating those for correlation coefficients but that the secon 

nd succeeding centroid factor loadings had much 


larger standard errors. 
© has criticized factor analysts for analyzing and interpreting peus Oa 
^ ipis any point justified by their data [23]. A need exists for E num 4 nn 
‘ve method of factor analysis which yields coefficients indicative 0 


@Ppropriat ined result 
priateness of accepting the obtained results. Еу Tae 
Theoretical атс ег on stability of factor analysis results ha 


arch i iversity and the Office of Naval 
are be, neeton Оеп Foundation unde" grant 


-642; H. В inel i gator. The prepari 
the arold ] investiga : ; 
accompa, Gulliksen, princip the Educational Dre many most helpful 


18 panyin, Я s 

gr; g mat b ided by 7 

© ы {от okean Harold Gulliksen and Samuel 8. W ilk 
S and suggestions. 


search was jointly supported b 
under contract N6onr-270-20 and 


111 


112 PSYCHOMETRIKA 


proven to be very difficult due to the complexity of the problem. Even so, 
some progress has been made. In addition to the study by MeNemar, several 
improved methods and sampling error formulas have been developed by 
Bartlett [2, 3, 4], Burt [5, 6], Lawley [17, 18, 19, 20], and Rao [24]. Other 
related work has been published by Emmett [9], Henrysson [14], Hoel [15], 
McNemar [22], Rippe [25], Wold [28], and Young [29]. The procedures 
indieated are so complex, however, that the computations for usual-sized 
factor analyses are feasible only with the use of large electronic computers. 
These developments have attacked only one of the problems of stability of 
results, the statistical significance of the results when compared with possible 
chance results from unrelated measures for a random sample of individuals. 
This is, indeed, an important problem to the psychologist, but it might be 
classified as an operational problem which might be overcome by repetitive 
studies on several samples of individuals. The samples might be made large 
enough to support the results obtained. 

A second and more important problem to the psychologist is the stability 
of factors when there are changes in the battery of tests analyzed. How well 
can factors be identified between two analyses when different tests are used? 
Do the factors transcend a particular battery? Thurstone ([26], p. 360) states 
the problem as follows: “The problem of factorial invariance should be 
analyzed with regard to the central purpose of factor analysis, the object of 
which is to discover a set of significant and meaningful parameters for de- 
scribing a domain.” He further states ([26], p. 361), “It is a fundamental 
criterion of a valid method of isolating primary abilities that the weights of 
the primary abilities for a test must remain invariant when it is moved from 
one battery to another which involves the same common factors.” We wish 
to add a criterion that the scores for individuals on a factor should remain 
tnvariant as the individuals are tested with different batteries which involve the 
factor. Surely these are the important propositions to psychologists if the 
factors are to be considered as basic traits. 


In order to quantify the problem of factor stability over changes in ® 
test battery, a f 


actor reliability coefficient is proposed. Consider that two 
distinct batteries of tests are administered to one sample of people. Let these 
batteries be designated as battery 1 and battery 2. pre. 

The matrices of intercorrelations for each of these batteries may Pe 
computed and designated R,, and Rz . In addition a matrix of correlations 
between tests in battery 1 and tests in battery 2 may be obtained and desi£- 
nated FR,» (or its transpose 7). The complete matrix of intercorrelation$ 
for all tests given is represented as a supermatrix 


7 а= Ba 
21 Ry 


By employing the matrix R,» , the correlation may be determined betwee? 


LEDYARD R TUCKER 113 


any faetor obtained from battery 1 and any factor obtained from battery 2 
Let pi represent a factor obtained from battery 1 and g» represent а factor 
рашын from battery 2. Suppose that one of the factors 0 is matched with 
‘actor p, and that this matched q is designated p» . The correlation between 
Di and р» can be interpreted as a factor reliability coefficient for factor p. 
A high value of this coefficient would indieate high factor stability from 
battery 1 to battery 2. A low value of this coefficient would indicate little 
correspondence between the factors for the two batteries. Thus, the factor 
reliability coefficients would yield a quantitative answer to the problem of 
factorial stability associated with changes in the test batteries. 
T he inter-battery method of factor analysis to be described here depends 
i a finding that factor matrices on reference factors can be determined for 
ч е two batteries from just the matrix of correlations R; between the two 
atteries. It is to be noted that only the faetors common to the two bat- 
ce are obtained and not factors that are represented in only one of the 
fact abteries. The intercorrelations for each battery are used only in obtaining 
or variances for a test of statistical significance for the factors determined 
and for the factor reliability coefficients. This procedure has certain simi- 
im to Hotelling's most predictable criterion [16] and Bartlett's external 
actor analysis [1]. One feature resulting from use of only the matrix Riz is 
that communalities are not involved in the factoring. 
Ап example of the inter-battery method of factor analysis given here 
uses data published by L. L. Thurstone and T. G. Thurstone [27]. Among 
the tests included in this study were a number intended to depend upon the 
Word fluency factor and several more intended for the verbal factor. As shown 
In Table 1, nine tests have been distributed into two batteries. Battery 1 


Includes two word fluency tests and two verbal tests. Battery 2 includes 
Although this selection and 


three word fluency tests and two verbal tests. ۹ 
distribution of tests might seem to have depended upon the analysis by the 
: urstones of the data to be used in our example, pretend for present purposes 
at the decisions on selection and distribution of tests to batteries had 
preceded the analysis. This does not seem too unreasonable ШАР? both of 
pum factors had been isolated previously and the tests were included by 
i he Thurstones for these factors. The tests have been assigned arbitrarily 
pos batteries. ia Tai 
| m Меп | jiven 1n а 
a r 2 ا‎ 5 the intercorrelations for battery 
З erations for bati ту X. in only the statistical tests for the factors 
ые computation of the coefficients of factor reliability. The өй M 
trices Ry, and А, will be used in the determination 0 а a b 
к E in ta s - bs two matrices are the transposes x a a 
ntain the correlations between the two batteries. The hig hes T 10 and 
6 to 7, in R. are for variables 45 and 46 of battery 1 with variables 


e 2. Ru contains the 


114 PSYCHOMETRIKA 


TABLE 1 


Tests Selected for the Example of 
Inter-Battery Method of Factor Analysis 


Battery 1 Battery 2 

Major a 
Factor Test Manis Test Name 

No. No. 
Eu MATE a 

42 Prefixes 25 First and Last Letters 
rm 54 Suffixes 24 First Letters 

27 Four-Letter Words 

45 Chicago Reading Test: 10 Completion 
Verbal Vocabulary 

46 Chicago Reading Test: 51 Same or Opposite 

Sentences 


n Нг —___— 


TABLE 2 


Intercorrelations of the Tests 


iE 


һә 5% ks 46 


l2 1.000 .55& .227  .189 361 .506 .Ю8 280 „21 

Bob 554 1.000 .296 .219 А ЭЛ9 .530 .ю5 зш’ 311 " 
i5 .227 .296 1.000 .769 11 .237 Эз з тв no 7 

46 .189  .219 .769 1.000 12  .226 290 68 661 

23 461 „А79 .237  .212 1.000 „520 „514 .313 — .2h5 

2% 506 .530 .243  .226 520 1.000 473 зв .290 

Вот X08 „№05 30} „291 Roy 51 T3 1.000 Qn .306 Rae 
gio -280 311 .т18 681 "313 .3%8 37} 1.000 .672 

51 -24L  .311 730 .661 5205 „290 306 „672 1.000 
ee 


LEDYARD R TUCKER 115 


var gene a of 4 to .5 occur for variables 42 and 54 of battery 

уз. inib es 23, 24, and 27 of battery 2. These two groups of higher 

is ^ нед ст эр to the postulated verbal and word fluency factors. 

де 8 exactly what should have been expected because the possession of 
mon faetors should raise the correlations in such a pattern. 


Determination of Inter-Baltery Factors 
i fundamental factor theorem given by Thurstone [26] in matrix 
is 
(2) R = FF’ 
, 


wher EKS В 
re the faetors are uncorrelated. For our case of two batteries of tests, 


the ac L ^ . 5 
factor matrix F may be considered as a supermatrix 


(3) a i: б, °| 
A, 0 G 


1 and A, is a matrix for battery 2 for factors 


contains factors appearing only in battery 
in battery 2. Substitution of (1) 


ad A, is a matrix for battery 
1 у. to the two batteries. Gi ‹ 

li 7, contains factors appearing only 
and (3) into (2) yields 


(4) iy alba F б, d rol, 
Р, Roe A, 0 Gl o е; 


fr pe Ps 
9m which is obtained 


P Ё = AAs- 

btained for any rank u (number of 
t by Eckart and Young [8] for the 
ower rank. The application of 
case is sketched as follows. 
ts in battery 2. Further, 


he tests in battery 2 be 


fact, A least squares fit to №, may be о 
app ors) of 4, and A, from a developmen 
"oximation of one matrix by another of | 


е ees and Young development to the present 
Het there be n, tests in battery 1 and m, tes 
d j (or J) and t 


let t} 
пе tests i 3onat 
des; n battery 1 be designate ) : 
“ignated J (or К). Define a matrix Hi with entries his РУ 


6. 
( ) H, = Rats , 
and 
(7.1 T 
) h; = >; Tif Jk ° 
are the sums of 


Note th diagonal entries 


Squares at H, is symmetric and that the 


of the correlations in rows of Ri» 


116 PSYCHOMETRIKA 
(8.1) hi; = 2 Жз 


Then, the sum of the diagonal entries of H, is the sum of squares of correlations 
in Riz. 


(9.1) >; hj = У, Dorie " 
j= j=l k= 
Consider the characteristic roots and vectors of M, , y} being the root 
f, and W, the corresponding unit vector. Entries in W,, are w;, . Then, from 


properties of characteristic roots and vectors (see, for example, [26], pp. 
500-503), 


(10.1) HW „ү? = Үл, 

апа 

(11.1) = Ў, 
7-1 i.l 


Substitution from (9.1) yields 
(12) 2G = bm. 
fel j=l k=l 


For convenience in subsequent discussion, let the roots y; be arranged in 
descending order 


(13) TIZA PHD dy, 
Substitution from (6.1) into (10.1) yields 
(14.1) RRi үу? = Wr. 


Define a vector W,, with entries w,, , 


(15.1) RLW дуг = Wye 
Substitution from (15.1) into (14.1) yields 
(15.2) RoW үу, = Wa. 


Substitution of the value of W,, in (15.2) into the left member of (15.1) 
yields 


(14.2) RW y; = Wr. 
Note that (14.2) is similar in form to (14.1) but involves W, instead of Wn - 
Define 


(6.2) H, = Ri > 


pu ocu d o ——— 


ےپ 
ا 


LEDYARD R TUCKER 117 


(7.2) hix = E» Tafir- 
ie 


The similarity of these equations to (6.1) and (7.1) is to be noted. For HM, , 
scalar products of row vectors of Ry are obtained, while for Н» , scalar 
products of column vectors of Ri: are obtained. Substitution of (6.2) into 
(14.2) yields 


(10:2) Нл? = Wa 
quence, 77 is à characteristic root 


Whieh parallels equation (10.1). As a conse 
eristic vector. It can be demon- 


of Ha, and W, is the corresponding charact 
strated that W, is a unit vector. 


Define 
(16) fu = fh — >; A; ry Wes > 
VL 
(17) Ruz = Ёз = 2 W yy Ws , 
fel 


and Wp are employed. Since the 


where the first u roots y} , and vectors Wn 
these first u roots are the u largest 


Dr Were ordered in magnitude as in (13), 
pots. As in (6.1) and (7.1) let 


(18.1) Ha = RaRa › 

and 

(19.1) haga = 5: fo. iku. Jk * 
kei 


Again the sum of the diagonal entries in Ha is the sum of squares of the 
entries in R 


12) 
2 " m 22 و‎ 
BIRD Y hui = 222; 97 
jel 25 


It 
can be demonstrated that 
(91. пі яз “2 
1) S ic p» ha = L тї - 
jel = 


Tiis Ж 
us, from (9.1), (20.1), and (21.1) 
(22) i.d 3 2 


DS rn = 0 
E gx к pars .eness of approxi- 
анон (22) provides a means for determining the ы not neces 
ation f 3 г tracted. 1015 я 
r i : e roots extra i 
om the original correlations and th atrix Ruse + 


Sary nm 
y to determine the smaller roots nor to compute the 


118 PSYCHOMETRIKA 


In order to obtain the factor matrices A, and A, from the foregoing 
characteristic vectors let 


(23.1) Ал = лт, 
(28.2) Ав = War. 


A, is column f of A, , and A; is column f of A, . Let a;, be the entry in row 
j of Aj , and a, be the entry in row k of Ap . It is to be noted that Ai and 
Аз will each have и columns. Equations (23.1) and (24.1) then can be written 


(24.1) aj; = шүү, 

(24.2) yp = шү" < 
Substitution from (24.1) and (24.2) into (16) yields 
(25) Fath = Ti = уз аа. 


This equation may be written in matrix form as 
(26) Ra. = Riz ам А.А 2 


A comparison of (26) with (5) indicates that Р, is the matrix of errors 
in approximating Rı by 4,4; . Equation (22) gives the sum of squares of 
these errors. By selecting the и largest roots, үз, this sum of squares is made 


a minimum for a given number, u, of factors. Computations could be ac- 
complished as outlined below. 


1. Select the battery with the smaller number of variables to be battery 1- 
Thus 


(27) Mm < Ne. 
2. Compute the matrix H, from (6.1), 
(6.1) H, = RGR . 


3. Determine the characteristic roots, y? , and vectors Wa, of Hi. 
This step might be accomplished by one of the iterative methods such as 
Hotelling’s method for determination of principal axes or components: 
(For a description see [26], pp. 483-484.) In this case H, is treated like & 
correlation matrix or a covariance matrix (but the diagonals are not ad- 
justed for communalities). When a principal axis is found, it is adjusted to 
a unit vector by dividing each loading by the square root of the characteristic 
root. (Note that “latent root" and “characteristic root" are synonymous. 
An alternative is to have these roots and vectors computed on an electronic 
computer. 

4. Arrange the characteristic roots in descending order and determine 


from (22) the sum of squares of residual correlations, $ dE 


LEDYARD R TUCKER 
119 


(22) momo, 6 
>; та = >; P» — » Tm 


fors 1 
uccessive ri 
essive removal of factors fe That is. 
- 


nı ns 


| »» 2 Dum 

is to be determi | 

À mined successively for 

pelea : ssively for u of 1, of 2, of 3, etc. WE i 

if the Eisen a is as small as desired, the preceding NS a ra 3 

pes el and vectors are determined in step 3 by a os ae 

Кеин Р 4 which yields the roots and vectors опе at à iie î 

Ficus e r of the roots, each successive root may be tested i this 

a — root and vector is determined. : p^ 

Ne n the characteristic vectors W i 

s2 are computed from (15.1) — АД?" 


(15.1) 
Wr = Wp- 


d, the vectors 


6. A chec t h st t y 
sek on er i ion racteris je vectors in 
he determir atior of the characteristic vec ors, na d 


Wasi 
2, and the roots тў is provided by (15.2). 


(15.2) 
RaW pvr = Ил 5 


A furtl 
her check 
r check should be made that both Wy, and Wy» are unit vectors, 


that is, 

(28.1) a 
>; ш = 1, 
"E 

(28.2) М 
PS Wes =1. 


kel 


4 10) 
ompute the columns of the factor matrices A, and A, by (23.1) 


and (23.2), 
(23.1) 

An = Way ^, 
(23.2) 

Ap = War”. 
ngs for our example is given in Table 3. 
computing the sums of squares of 
ws of the off-diagonal correlation 
could have been computed from 


The Computation of factor loadi 
Tows and H, of (6.1) was obtained by 
Matrix R ms of cross products between ro 

12 Of Table 2. A similar matrix Из 


e sums 
of Ra, fe Squares of the columns and sums of products 
it would have had five rows and columns. Only one of these 
tors of Н, were obtained, the 


ding roots 


Matrj 
‘ices is requ; 

required, The characteristic vec 
quares of 


first t 

Wo bei s б 

ате given - listed as columns of the matrix 
the row 4? near the bottom of the 


W,. The correspon 
table. The sum of 8! 


120 PSYCHOMETRIKA 


the correlations in Ry. was 3.9934. After one factor the sum of squares of the 
residual correlations, r,,;, , was .4570 as given in the bottom row of Table 3. 
After two factors, the sum of squares of the residual correlations was down 
to .0008. Since this indicated quite small errors of approximation, and since 
the subsequent statistical test indicated that no more factors were justified 
by the data of our example, the two factors were considered to be sufficient. 
The matrix W, was found by (15.1). 

When the vectors of W, and the values of y7' were substituted into 
(15.2), the vectors W,, were obtained. This provided a check on the determi- 


TABLE 3 


Determination of Inter-Battery Reference Factors 


ie 
Test Н, =R В! Ау = | d 
No. 4 1212 (Characteristic à, NY 


vectors of H.) 
1 


| 


2 5% 55 6 I, пу 1 1n 

he .7715 8244 ‚1332 .6808 +4203 .5669 .576 466 
5+ „8244 .8844 .8218 .7624 .%3610 .5391 .622 443 
45 41532. .8218 1.2561 1.1651. +5729 -.h569 .786 -.376 
46 6808 .7624 1.1651 1.0814 +5316 -.4234 +729 -.348 
Test RI ow. V, = Вии. 1572 
te. 12 e = 24|» | 
—_— “ e 

I її 

=. = Зы ПЕ i ШШ 
25 «6651 .2215 -3526 .4760 484 .391 
24 .7164 .3659 -3809 .5417 522 ANS 
27 6965 .1585 -3703 .2936 508 .241 
10 1.0344 -.2900 +5301 -.h295 751 -.355 
51 1.0143 -.3091 +5394 -.4577 740 -.316 


— 


Characteristic roots of H 


I II 
2 gn 
D 3.5364 .%562 
7 1.8605 .6754 3.9934 
i 
v 1.5115 .8218 
эү. na ө 
ja кеу чк и аин 


LEDYARD R TUCKER 121 


S get of the characteristic roots and vectors. That the columns of W, and 
a mr unit vectors was checked by verifying that the sums of squares of 
e € e: in these columns were unity (within rounding error). Each column 
diente E uw. W matrices was multiplied by the corresponding y? as is 
ani ҮН in (23.1) and (23.2). This produces common factor matrices A, 
A, . These matrices reproduce the given matrix Ris in a least square 
sense by equation (5). 
án T: id to be noted that the factor ma 
still EOS ormations of axes. The rotationa 
ll exists. More material on this point w 
this paper. 


trices A, and A, may be subjected 
] problem of the regular methods 
ill be given in a later section of 


Maximum Inter-Batlery Covariances 


Б. interpretation for the W matrices is of interest. Each column 
#5 А, or factor, in these matrices gives а set of weights that may be applied 
eh EUM scores for the corresponding batteries to produce composite 
TI onsider the standard score matrices S, and S, for the two batteries. 

are row vectors and individuals are column vectors. Entries In 8, are 


ee Sine: 
i: , and entires in S, are s; . Let 


(29.1) tni = Ушин , 
fel 

(29.2) tee = ER [m 
kel 


scores in battery 1, and 
te of scores in battery 2. 
f of W, and of We, 
designated by бл? › is 


2 tes Scores on à weighted composite of : 
"Dh we Fein scores on à weighted composi 
respe SER, Wi and w,,, ате entries 1n column 

ctively. The covariance between xj; Md Trai » 


(30 1 
> сп = N »» gna › 
k i M H 1 
e N is the number of individuals in the sample. The weights will be 
se odis Е Ч 
П бо maximize this covariance. 


Instead i i i 

of defining the weights Ш SU і 
a r 

nd r; are some dud constants, the weight vectors Wn and Wy will be 


d fA Р 
efined here as unit vectors. The finition leads to the canonical 


ie "relation of Hotelling's most pre [16]. Definition of var- 
Nees involves the intercorrelation matrices зп and Rs» and, thus, the 


is H 7 
ariances ог communalities of the tests. In order to avoid the communality 
; internally limited to unit vectors. Some 


Problem 

; , the wei eA 

limi weight vectors may be 1 1 s в 
ra Inust be placed on ihe тесік, for otherwise r12 could be made ra 
Sly large by use of increasingly large weights. This restriction 18 expresse 


28.1) and (28.2). 


ch a way that the variances of Tyrs 


former definit 
dictable criterion 


122 PSYCHOMETRIKA 


The solution for maximum covariance, су: , under the restrictions of unit 
weight vectors involves the matrix H, of equation (6.1) [or matrix H, of 
(6.2)] and its roots and vectors of (10.1). The weight vector W, is determined 
as in (15.1). There are as many orthogonal solutions as there are tests in the 
smaller battery. It turns out that 
(31) 


Сло = Ту; 


thus, the maximum cj; is obtained by choosing the largest root. As many 
pairs of weight vectors and resulting composite scores may be taken as there 
are significant roots y, . An interesting property of this solution is that 


(32) Y ETE = 0, (f = g). 


Thus, each composite from one battery is uncorrelated with all composites, 
except the corresponding one, from the other battery. Each successive pair 
of weight vectors, W,, and W, , for successively smaller roots y; , involves 
the maximum inter-battery covariance remaining after, and independent of, 
the covariances for the preceding pairs of weight vectors. 


Test of Significance for Factors Determined 


An approximate significance test for the minimum number of factors 
determinable from any given large sample data has been developed for the 
inter-battery method. While this test seems reasonable on intuitive, logical, 
and geometrieal grounds, a completely rigorous mathematical development 
of this test, or any variant of it, has not been found. The grounds for the 
reasonableness of the proposed statistical test will be discussed in the sub- 
sequent section. The nature of this test will be presented in this section. 

Table 4 gives the calculations of the proposed statistical test when 
applied to the illustrative problem. Three constants of the problem are given 
in the line immediately under the heading. These are: N, the number of 
people on which the correlations are based; n, , the number of tests in battery 
1; and n, , the number of tests in battery 2. In the body of the table 
there is a column for the original data and one column for data after the 
determination of each factor. Row 1 gives the headings for these columns in 
terms of the factors already determined, f and u being used as subscripts to 
designate factor number. Rows 2 and 3 give the number of tests in the two 
batteries decreased by the number of factors already determined as indicated 
by the column headings. Row 4 gives the products of the number of tests 
decreased by the number of factors. These values are interpreted as the 
number of degrees of freedom for a x distribution, 

Rows 5 and 6 repeat information from Table 3. Row 5 contains the 
roots 77 for the factors. Row 6 contains the sums of Squares of the correlations 
in R, and the sums of squares of the residual correlations after each factor. 


LEDYARD R TUCKER 128 
TABLE 4 
Statistical Significance Tests for Inter-Battery Reference Factors 
BEEN SR 
| 1. f,u 0 d 2 
Р = == = 
2 (n; - 0) 4 5 
5. (n4 - u) 5 4 : 
à. (n - v)(n, = u) = (a.f-) 20 12 
.53 4562 
5. 32 3.5364 5 
E. .0008 
6. mi - E № = (ay 3.9934 4570 
TEES 
1506 1.1820 
y 1 85. 2.130 
8 EET 4.0000 1.85% «677+ 
. my - ® 85 
fal 
2.5604 1.1095 
" 9. Sp ү 
uo 5.0000 2.8396 1.5501 
10. n- ES 2 
pei f? 
u u 0.0000 4.5262 .9010 
We ty = SE Bee) = 0 287 
Lm pne үл 1? 
2 3 .8 
12. в REJA 2855. 858 7 
4 w^ B 
26. 16.8 
2 — 38. 
| 15. Past. = (nj -u(n - P ) 
b < .01 т 
d Be ip < .01 
and zy»; of (29) on 


t Rows 7 and 9 contain the variances ram ria И on the character- 

isti factors for the two batteries. xp sue Nd the following matrix 
1C vectors WW, ¢ Ж. 4 may be 0 aine 

equations: V,, and Wp and may 

(83.1) n = л ? 


(33.2) 52, = таз - 


у2 


124 PSYCHOMETRIKA 


r3 
It is noted that the intercorrelation matrices R,, and Ro: for the two batteries 
are employed. Unity is used in the diagonal cells of these matrices and not 
the communalities. Rows 8 and 10 contain the total test variances in each 
battery decreased by the factor score variances for the factors already de- 
termined. Row 11 contains the products of the entries in rows 8 and 10. . 

A coefficient Ф, appears in row 12 and is computed from №, and entries 
in rows 4, 6, and 11. ®, is given by 


mi mns 


N(n, — u)(n; — u) У) ÉL 
1 


(84) Ф, = - NE 
(s —- X Sh o. -> PA 
fal f-2 
In case: 


i. the first u factors are well determined (have large roots), 

ti. the population correlations depend only on the first м factors, 
їй. the population density function is multinormal, 

w. the sample used for the analysis is large and random, 


then 4, is approximately distributed as х? with (n, — и) (п. — u) degrees of 
freedom. The x? with these degrees of freedom for a p of .01 is given in line 13. 
In line 14, the value of p is listed corresponding to Ф, . The results for our 
example indicate clearly that the first two factors are justified but that the 
second factor residuals are so small as not to justify any further factors. 

In the case of each table of correlations or residual correlations a. hypo- 
thesis is made that the coefficients deviate from zero due only to sampling 
of individuals. That is, the hypothesis is made that there are no further 
factors. Acceptance of such a hypothesis does not indieate, however, tha 
there are no factors involved for the population. Such factors might be found 
with more extensive data. The statistical test must be understood, therefore, 
asa minimum test which helps support the idea of existence of factors, but does 
not necessarily negate an idea of further factors. "i 


Justification of Statistical Test 


The statistical test involves a chi Square approximation to the theoretical 
distribution for correlation coefficients. Fisher [10] found, for a zero population | 


correlation and a normal bivariate distribution of variables, the distribution 
of sample coefficients to be H ties э 


(35) ffr) dr = k,(1 — 9-9 a. 
Let 
(36) ф= Nr. 


LEDYARD R TUCKER 125 
5 


Then 
(37) А 
Substituti dr = 3g N dé. 
itution of (36) and (37) into (35) yields 
3g 3-4) 
(88) 19) dé = z(t - A з" dé. 
Note that as N — œ 
(39) (1-3 unt гө 
| N. — e . 
Define 
E ky = INC. 
bation 
» stitution of (39) and (40) into (38) yields 
41 
) JO) dà = ke Ve "^ dé. 


стра 
а (41) is the distribution for у? with one degree of freedom. An 
of 100 > gp of the results of (35) and (41) indicates that, for N’s 
butions diff ore, values of r for p = .01 as determined from the two distri- 

Ts of à by no more than .001. wv 
Gorilla rder to apply the foregoing frequency distribution io an entire 
ation matrix of the form of Riz it is necessary to inquire into the possible 


Statisti Scu d 
stical dependence of the several coefficients of correlation involving 
independent, the repro- 


comm: 
on vari : 
1 variables. In case these coefficients Were 

ould permit the summation 


ductiy 

of is wen of the chi square distribution W it th 

several ¢’s, which would yield a new coe i as chi 
the num 


Square wi 
sum, ed as many degrees of freedom as 
1 and an case that all population coefficients of correl 
з. as well as in matrix Алг) ave d large samp 


he san 

x 1 t + 

ple coefficients of correlation 1 a seem adequately 
foregoin! presented in 


tl zero an 

indepe n the matrix Р, 

е era to warrant use of the g sum of the ó's as 
ih paragraphs. 

equal "ei first, the case of one Y 

о а and b, in battery 2. Let 


а normally g; 
of V ally distributed population of : 
e drawn rando 


ariable, js 


h o фо 
endently 
ble j vi ] be in 


126 PSYCHOMETRIKA 


ation of variable a with variable j. For any particular sample when er 
Tj; , it does not matter how correlated variable a was with variable j. : 
follows then that $;, and $;, are independently distributed, also. It does no 

follow, however, that ¢,, would be independently distributed from Qia and djs + 
Any two of these coefficients are independently distributed, but the complete 
set involves some dependence of distribution, By employing only the inter- 
battery correlations in matrix R,, ; Such complete sets of intercorrelations 
are avoided. Therefore, the entries in any particular row or particular column 
of Ry. and the corresponding $'s are independently distributed under the 


hypothesis that all population correlations, including the intercorrelations 
for each battery, are zero. 


Consider next the case of two variables 
and two variables, № equal to a 
scores for each variable, dr 
normal universe will yield di 
We have already shown that 
are independently distribut 


; j equal to 1 and 2, in battery 1 
and b, in battery 2. Repeated samplings of N 
'awn independently for each variable, from 4 
stributions of the Coefficients Qa , фу , фз з $2 ° 
the pairs $i. — фф фа e — boa di, Ф 


ed. The pairs ¢,, — $a and фу, — do, are also 
independently distributed Since the second member of the pair involves 


variables distinet from the first, member. Consider апу triplet of coefficients 


such as ġa — dy, — $za . The distributions of each of these coefficients is 
independent of the distribution of the other two numbers of the triplet since; 
when any two are fixed, the thi 


е freedom of distribution. For 
example, let the scores on vari i cal with the scores on variable 
b and let the scores on vari i 
This fixes $,, and ф,, Ssible values, The distribution 
between variable 1 and vari i i ced by the preceding identities 
of scores. Thus, ¢,, is i istri 

'There is some dependence of distributi 


tend to disappear 


size. 
Let Ф be the 


(42) b= рж 25 Dir, 
or from (36), 


(43) ey Ya 


The foregoing material indicates that there is no de ng the 
several ¢’s which affects the first three moments of il e of $, 
that is, the mean, variance, and skewness of Ф, Higher moments are affecte@ 
but to a reduced extent with larger samples. Tt seems appropriate to employ 
the chi square distribution in evaluating Ф. There wil] be йг degrees 0 
freedom. Е 


LEDYARD R TUCKER 127 


In order to make a statistical test after each factor has been determined. 
and before going on to the succeeding factors, it is necessary to make an 
adjustment in the preceding test. The sum of squares of the residual corre- 
lations, r,;, , after u factors may be obtained easily from (22) when the roots 
of the matrix H, are known. But these residual correlations refer to residual 
variables with lowered variances and involve dependencies due to the use 
of some degrees of freedom in the factors already determined. The necessary 
adjustments will be discussed for the general case when и factors have been 
established and it is desired to test the residual matrix Ruz for justification 
of any further factors. 

Let V; be an n, X mi orthogonal matrix containing as its first и columns 
the vectors Wa (f =1,2,°°°, и). Similarly, let Ve be an n, X n orthogonal 
matrix containing as its first u columns the vectors Wye (f = 1,2, +, 0). 
Let the column vectors of V, be », and the column vectors of V; be v, . Suppose 


the vectors v, and v, contain weights for producing weighted composites for 
batteries 1 and 2 in a manner similar to use of Wa and Wy. in (29.1) and 


(29.2). Thus, 


(44.1) a P oh | 
jel 

ма Xa = Y nes . 
kel 


The matrix of intercovariances for each battery may be obtained from the 


following matrix equations: 
i Cyr 
а Cea 


Where Gi | 
1 ‘ix of intercovarla 
аас 35 the matrix of inte 


Vii Vi » 


VIR Vs , 
nces of composites р for ке. ә 
i battery 2. Lhe 
DIS 18 the matrix of intercovariances of sanie vei magi 
M covariances Cp, between composites p n q 
pa р 
апа 2 Covariances) may be obtained from 


\ 


u 


^i C. = ViRi2V2 . 
This ч ра 
15 matrix will be of the form 
ч Сы = ч 0 А 
ч Cu 
(p.a?) 


ining 48 diagonal entries Yr 0 = 
a 2 


Where TEN ' 
1 matrices, 


, à diagonal matrix cont 


? 777 , V). Since V, and V; are orthogona 


| эз $ т,. 
48) "5 


128 PSYCHOMETRIKA 


As a consequence of the form of C,, given in (47), and (22) and (48), 


LEN ni ma ч 
(49) 2: 22% = >) oh З 
peutl qul ini kel 7-1 


І 


(50) 5 Ei, . 


Equation (49) yields an efficient procedure for determining the sum of squares 
of the entries in the Cp, (p, q > u) matrix. It is to be noted that this matrix 
is of size (n, — u) x (n; — u). 

In order to apply the chi square test of (36) and (41) it is necessary to 
obtain the r,, from the €» . To do this, the corresponding variances ate 
required. Since V, is an orthogonal matrix, the trace (sum of diagonal entries) 


of E, yields the trace of C,» in (45.1). Note that the trace of R, is n, since 
all diagonal entries in T^, are unity, Thus, 


ER 
(51.1) C = T. 

pel 
From our definition, V, contains the vectors W 
corresponding variances Sh, are 


(52.1) Cy = Sh = »» È Шуу, (=p < i). 


Equations (51.1) and (52.1) yield 


: . the 
ıı as its first u columns; th 


па 


(53.1) У е =n — 255b (р> 0. 
р=и+1 fal 


Similarly for battery 2, 


na ms 


(52.2) Sh = È р» VADER, <u; 
(53.2) PL 285, ^ (qu. 


If the vectors v, 


(р > u) are chosen so that the с, 
from (53.1), 


n 
»» Equal a constant, the 


1 x 
(54.1) б» = pers т) (s — 2 sh), (p >a). 


Similarly for battery 2, 


| . 
(54.2) ае Eu) qua 


LEDYARD R TUCKER 129 


The " 

rel: 

elation between r,, and Cpe is 
be 


(55) , E 
гуэр (p, а> и). 


foa = 
[m 


From (36) 


5 2 
ч à. = NSE: 


[or 


A total 
coeffici 
efficient, &, , may be obtained by rewriting (43) as 


m na 


(57) p 5 ў 
„= $ra- 


peutl geutl 


Substi Е 
sütuti ; е, 
on from (49), (54.1), (54.2), and (56) yields (34) and, its equivalent 


(58) Ф, = Ма = u)mns — u) [А p» 2 ч i) 
ے‎ fi T тз]: 
jel kel 7-1 


b- Eile Es) 

In 3 А ` E 
Cp бк, Een when the population covariances Cra (between batteries), 
approximat 3 tery 1), and с, (within battery 2) are all zero for p > u, Ф, is 
his Md. y distributed as x? with (m — и) (n, — u) degrees of freedom. 
Parallel to is series of statistical tests for а minimum number of factors 
matrix, Each, statistical test previously discussed for the original correlation 
Justified by these tests yields an indication whether any more factors are 
1e given data beyond those factors already determined. 


Rotation of Axes 


As 

8 not a: к 

rmation -€ earlier, the matrices A, and 4: m 

? be н (5) will still be true for the transform 
t X и, nonsingular transformation matrice: 


ay be submitted to trans- 
лей matrices. Let M; and 
s. u is NOW taken as the 


Dump, 

(59 ч of common factors. Let 

(59.2) В, = AM, 
В, = А Mo - 


Subs H 

Stituti 

(60) ion of (59.1) and (59.2) into (5) yields 
Ё = B (MM) B: . 


In, 
simplify rm of (5) with 


азе ЛГ; is 
(61) 1 is the inverse of M; , (60) will to the fo 
Since 1 Rs В.В: · 
ormai : and ЛГ, are not restricte ices, the trans- 
lor restric : 
factor à Problem has a more general orm than occurs aps 
ysis. A particular special case of interest occur a 


being Ort 
pie s for the usual t 


130 PSYCHOMETRIKA 


are diagonal matrices. The effect is to change the entries in the columns of A 1 
and A, proportionately. When one column of A, is multiplied by one constant, 
the corresponding column of A, is multiplied by the reciprocal of this constant. 
Thus, the scaling of the columns of factor loadings is not unique. 

Consider a second special case where (M/M 1) '' is symmetric and has 
unity in the diagonal cells. If 


(62) Е, = (MiM)", 
(60) becomes 
(63) Кш = B,R,B;, 


which corresponds to the fundamental f. 
([26], p. 354). 

The preceding cases require that the matrix (MM)! be symmetrical. 
No satisfactory method has been devised for transformation of the matrices 
Which incorporates this restriction. In one sense, however, it is desirable to 
treat the matrices A, and А, separately. Any lack of conformity in these two 
operations will result in lowered factor reliabilit 
critical evaluation of factorial stability, 


actor equation for correlated factors 


equivalence of transformations as 

€ two batteries constitutes an important aspect. 
Low factor reliability coefficients may reflect, in part, nonequivalent trans- 
formations of the factor matrices, Greater assurance of stable factor determi- 


igh factor reliability coefficients. 


presents the graphs used for each of the 
the A matrix was taken 
each variable using the e 


two batteries. Each of the factors in 
as an orthogonal axis and points were plotted for 


as coordinates. Oblique lines 
were drawn through clusters of points and the normals to these lines were 
constructed. The new factors were assigned the code identification of Ai 
and B, for battery 1 and A, and B, for battery 2. The factors A, and A, have 
word fluency tests (see Table 1) with nonzero projections on the normals, The 
factors B, and B, have verbal tests with nonzero projections on their normals. 


; paired as indicated, The directi 
M matrices of Table 5. The 
and (59.2). The entries in the B matrices 
5 on the normals drawn on 
is given also in Table 5. It will 
this matrix does not have unity in the diagonal cells and is n 
Some slight differences are indicated in the comparability of r 
for the two batteries by the lack of symmetry, 


the graphs of 
be noted that 
ot symmetric. 
otation of axes 


Factor Reliability Coefficient 


The final phase of the inter-batter 


y method is the determination of the 
factor reliability coefficients, In this context, the entries in the B matrices 


= 


)———— Sa —— 
الڪ‎ чиа 


LEDYARD R TUCKER 131 


| 


Will be i ` s -— . 
vari Considered as regression weights for predicting scores on the given 
variables in one ba 


in the othe. ttery from composite scores obtained from the variables 
other battery, The predictor composite scores are determined from a 

the W matrices, this transformation depending on the M 

ransforming the A matrices. Let, in this context, 


transformation of 
atrices used int 


(64.1) 


Ү, = Wili; 
3 (64.2) 1 111 


У, 


M 


FIGURE 1 
Graphs for Rotation of Factors 
| 
| WT. 


132 PSYCHOMETRIKA 


TABLE 5 


Rotated Inter-Battery Factors 


a 


Battery 1 Battery 2 
ڪج جص‎ a ا‎ 
B M, Ba 

(Transformation (Projections (Transformation (Projections 
Matrix) on Normals) Matrix) on Normals) 

wq ees See, 
Jt a a Эу m. S XL ab 
I .h3k „598 42.670 -.029 I .hho .633 23  .56& .003 
II .901 -,802 54 .67 .023 II .898 -.774 2k  .630 -.014 
05.003 „тті 27 0 .135 
46 .003 .715 10 „015 .751 


(мум) 


ә B 
Ay 1.240 567 
В, .525 1.241 


а ањ 


The definitions of the V matrices are different 
on the statistical test for the factors determined. Equations (44) to (46), how- 


V matrices. The 


variances of the 


omposites in the other batter ined 
by the V's. attery as determi 
(65.1) C = RV, , 
(65.2) С, = RLV, . 


As before, tests j and composites p are for battery 1, tests k and composites 
а are for battery 2. Let B, „ be the matrix of regression weights for predicting 


scores on tests j in battery 1 from the composites q in battery 2. Then, from 
standard regression theory, у , 


(66.1) Bj, = СО! 


очар" 


LEDYARD R TUCKER 133 


Similarly, 
(66.2) Bey = СО. 


It is desired to define the T matrices such that the rotated factor matrices 
D, and B, are the matrices of regression weights. 


(67.1) B, = Byes 

(67.2) В, = Beg 

This may be accomplished by defining 

(68.1) T, = (WIR,W) YM, 
(68.2) T, = (ИРИ) ҮМ. 


The appropriateness of these definitions may be checked by matrix algebra 
using equations (15), (23), (45), (59), (64), (65), (66), (67), and (68). Because 
of the length of this matrix manipulation it will not be presented here. 

A simplified formula for computation of C,» may be obtained by sub- 
stitution of (64.1) and (68.1) into (45.1). Then 


(69.1) б» = MUT, . 
Similarly 

(69.2) Cy = LP 
A similar substitution into (46) yields 

(70) Cre = (ty ^) T). 


Scores on ‘the composites p and q as defined in (44.1) and (44.2) from 
the V matrices will be considered as scores on the faetors. Justification of 
this step lies in the fact that the basic postulate of the relation of observed 
Scores to factor scores corresponds to a linear regression for prediction of the 
observed scores from the factor scores. This is precisely the situation in the 
Present case. The entries in B, are the regression weights for predicting the 
observed scores в,; for battery 1 from the composite score тү; for battery 2. 
Similarly, the entries in the B, are the regression weights for predicting the 
Observed scores s,, for battery 2 from the composite scores x,; for battery 1. 

5 à consequence, the correlations between the composites z,, and Xe; are 
the inter-battery factor correlations. These correlations may be entered in 
à matrix R, ‚ Whieh may be determined from the covariance matrix C,, . 

Orresponding variances of the composites appear in the diagonal cells of 
; 16 matrices C. , and С,о . The correlations between corresponding dem 
In the two batteries may be interpreted as the factor reliability coefficients. 

T able 6 gives the two Т matrices, the C matrices, and the R,, matrix for 
"i illustrative problem, The T' matrices were computed by (68) from values 


134 PSYCHOMETRIKA 


TABLE 6 


Inter-Battery Factor Correlations 


T т, 

k h à A 
I, -5457  .6329 I, 02 .6050 
п, .\ёш% -.3362 IL, -3818 -.52!ә2 


— 


99 
ت 

22 A By 
Ao 91h .5%66 А „757 597 
B, 5066 .9912 ву -4597 1.0536 


Pa Ppa 
——— PM 
2 A 2 X 
As -5766 2393 AS -693 458 
B, +4372 .8390 В, .505 81 
—— CON 


obtained during the factoring and the rotation of axes. Equations (69) and 
(70) were employed to obtain the C matrices. Variances for the factors are 
in the diagonal cells of C,» and Са , which are used along with the entries 
in C, to determine the correlations in Rya . The factor reliability coefficients 
are located in the diagonal cells of Ts. 

The coefficients of .693 for factor A and -821 for factor 
with test reliabilities, seem low. Whether they are to be co: 
factors depends upon further experience and consideration 
istics in terms of the purpose of the experimenter, Use 
reliability will lower the factor reliability coefficients, If t 
communalities, the factor reliability coefficients will al 
measurement error variances and large specific factor vari 
low factor reliability coefficients. In the present example 
stability of the determination of the word fluency fac 
less than that of the verbal factor. This indicates the nee 
better tests for word fluency. 


B, when compared 
nsideréd as low for 
of their character- 
of tests with low 
he tests have low 
so be low. Large 
ances will result in 
it is apparent that 
tor is considerably 
d for construction of 


Ee 
ee a, 
P — 


—_- G 
 C—— 
Í سسس‎ 
Ee ‹ 
ini У 4 

= аиы 


LEDYARD Е TUCKER А 135 


Use of the inter-battery method of factor analysis as presented in the 


preceding material should help in firmly establishing factors for the descrip- 
The controversial point of the estimation of com- 


; avoided. A statistical test for judging the mini 
аш ең зни justified. by the data is provided. The bobo reliability 
bei y! sio ould be of assistance as an indication of the extent that factors 
oe : шелш from different test material. This latter is related to the 
ak y jew inique psychological problem of generalizing factorial results 
r other behavior of people than is involved directly in the tests employed 


in the analysis. 


tion of human behavior. 
munalities is completely 


REFERENCES 


[1] Bartlett, M. S. Internal and external factor analysis. Brit. J. Psychol., Statist. Sect., 


1948, 1, 73-81. 
[2] Bartlett, M. S. Tests of significance 
1950, 3, 77-85. 
[3] reus M. S. The effect of standardization on a x* approximati 
^ iometrika, 1951, 38, 337-344. 
1 Bartlett, M.S. A further note on tests of signifie 
Б Statist. Sect., 1951, 4, 1-2. 
] Burt, C. A comparison of factor analysis an 
(6 Statist. Sect., 1947, 1, 3-20. 
] Burt, C. Tests of significance in faetor analysis. Brit. 
" 5, 109-133. 
d end R. B. A universal index for psychological factors. Advance Publicati 
Él 1053. Laboratory of Personality Assessment and Group Behavior, 
(8) ора Univ. Illinois. 
M, C. and Young, G. The approximation of one matrix by another o! 
(9] damna, 1936, 1, 211-218. 
e W. G. Factor analysis by Lawley's method 
по] s Psychol., Statist. Sect., 1949, 2, 90-97. 
isher, В. A. Frequency distribution of the values of th 
t) ше from an indefinitely large population. Biometrika, 
f rench, J. W. The description of aptitude and achievemen 
пә] sm Psychometric Monogr. No. 5, 1951. Chicago: Univ. 
s J. W. The selection of standard tests for factor an 
из] 182, 7, 297. (Abstract) 
rain] J. W. Kit of selected tests for reference aptitude and ac 
inceton: Ed š ing Service 1954. 
[14] Ho lucational Testing Servic’ : . 
| Henrysson, S. The significance of factor loadings. Brit. J. Psychol. 
п] 1250, 3, 159-165. 
Yen P. G. A significance test for minimum г 
(16) 1239, 4, 149-158. 

] Hotellin; jn 4 
Табук g, Н. The most predic 
Pi: ey, D. N. The estimation of 

[18] Law] Roy. Soc. Edin., 1940, 60, 64-82. — ü 
194 ey, D. N. Further investigations in factor estima’ 

п mi 61, 176-185. 
еу, D. N. Problems in factor 2 


in factor analysis. Brit. J. Psychol., Statist. Sect., 


on in factor analysis. 


ance in factor analysis. Brit. J. Psychol., 


d analysis of variance. Brit. J. Psychol., 


J. Psychol., Statist. Sect., 1952, 


on No. 3, 
Dept. 


f lower rank. 
of maximum likelihood. Brit. 


e correlation coefficient, in 
1915, 10, 507-521. 
t factors in terms 0 


Chicago Press. 
alysis. Amer. Psychologist, 


f rotated 


hievement factors. 
Statist. Sect., 


ank in factor analysis. Psychometrika, 


1935, 26, 139-142. 


educ. Psychol., 
m likelihood. 


table criterion. J. д 
factor loadings by the method of maximu 
tion. Proc. Roy. Soc. Edin., 


nalysis. Proc. Roy. Soc. Edin., 1949, 62, 394-399. 


136 PSYCHOMETRIKA 


[20] Lawley, D. N. Factor analysis by 
Statist. Sect., 1950, 3, 76. 


[21] MeNemar, Q. On the sampling errors of factor loadings. Psychometrika, 1941, 6, 
141-152. 

[22] McNemar, Q. On the number of factors. Psychometrika, 1942, 7, 9-18. 

[23] McNemar, Q. The factors in factoring behavior. Psychometrika, 1951, 16, 353-359. 

[24] Rao, C. R. Estimation and tests of significance in factor analysis, Psychometrika, 
1955, 20, 93-111. 


[25] Rippe, D. D. Application of a large sampling criterion to some sampling problems 
in factor analysis. Psychometrika, 1953, 18, 191-205. 

[26] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[27] Thurstone, L. L, and Thurstone, T. С. Factorial studies of intelligence. Ps; 
Monogr. No. 2, 1941. Chicago: Univ. Chicago Press, 

[28] Wold, H. Some artificial experiments in factor analysis. In: Uppsala Symposium on 


Psychological Factor Analysis, 17-19, March 1953. Nordisk Psykologi’s Monograph 
Series No. 3. Pp. 43-64, 


[29] Young, G. Maximum likelihood estimation and factor analysis. Psychometrika, 1941, 
6, 49-53. 


maximum likelihood: a correction. Brit. J. Psychol., 


ychometric 


M anuscript received 2/18/57 
Revised manuscript received 8/26/57 


PSYCHOMETRIKA—VOL, 23, NO. 2 
JUNE, 1958 


COMPARATAL DISPERSION, А MEASURE OF ACCURACY 
OF JUDGMENT* 


HAROLD GULLIKSEN 
PRINCETON UNIVERSITY 
AND 


EDUCATIONAL TESTING SERVICE 


It is suggested that the ambiguity of a set of paired comparison judg- 
ments may be measured by the quantity v8 F of — 2rijoio;. This quantity 
is termed the comparatal dispersion. £ simultaneous solution for scale values 
and ratios of comparatal dispersions has been presented and applied to some 
data on food preferences. 


be taken to represent the ambiguity 
the greater the discriminal 
idgments “‘ less than 


The discriminal dispersion may 
ri а single stimulus, The greater the ambiguity, 
“ispersion, and the nearer to .5 will be the proportion of ju 
j” for any comparison involving that stimulus. 

The Law of Comparative Judgment [8, 9, 10] states that 

8; س‎ 8; = Zi ot оў — rici 
where 
stimuli; 
of the stimuli; 
ts for the two stimuli; 


5; and s; are scale values of the 
a; and c; are diseriminal dispersions 
r is the correlation between judgmen 
2:; is the normal deviate corresponding to pi; ; 
Ps; is the proportion of judgments ? < ГА 
з far no feasible solution for the o's (or the discriminal dispersions) has 
Cen proposed. However, the radical term can be solved for in certain cases. 


he quantity Vo? + e; — 27510102 s which represents the ambiguity of the 
$ : paratal dispersion. 


total comparative judgment, will be termed the com; 
The problem of the present paper is “Does the magnitude of the com- 
riation in the complexity of the judgment?” 
of the judgment was varied by using what 
ther with what we will term composite 
gives the possibility 


the Office of 


inceton University, 1 
Prince е Foundation 


i t by L 3 
tedi Б), E. the National Science 
tional Testing Service. R 

United State: 


“Thi 

Naval his research was jointly suppor 

under grany reh under contract Nonr- 
Whole DEL NSF G-642, and in part 

n part is permitted for any purp 

187 


138 PSYCHOMETRIKA 


for three different degrees of “complexity of judgment” as illustrated in 
Table 1. 

In item A, one unitary stimulus (Loin Lamb Chop) is paired with another 
unitary stimulus (Sirloin Steak). Sinee two unitary stimuli are involved for 
the judgment required in item A, we may say that this represents a judgment 
of complexity 2, which will be indieated by the subscript 2. 


TABLE 1 


Illustrative Stimuli 


Complexity 


- Item Stimulus of Judgment 


ss 


П Loin Lamb Chop Unitary 
A 2 
ð Sirloin Steak Unitary 
Loin Lamb Chop and 
[x] Roast Rib of Prime Beef Composite 
8 Sirloin Steak and + 
[8] Roast Loin of Pork Composite 
[| Sirloin Steak Unitary T 
c з 
Loin Lamb Chop and i 
[8] Boiled Smoked Beef Tongue Composite 


ا E‏ ڪڪ 


In item B, the subject is asked to choose between one composite stimulus 
(Loin Lamb Chop and Roast Rib of Prime Beef) and another composite 
stimulus (Sirloin Steak and Roast Loin of Pork). Since four unitary stimuli 
are involved for the judgment required in item B, we will designate these 
by the subscript 4 and say that this represents a judgment of complexity 4. 

Similarly, item C involves a comparison of a unitary and a composite 
stimulus. We will call this a judgment of complexity 3 since three unitary 
stimuli are involved in the judgment, and designate these by the subscript 3. 
The total schedule used involved five unitary stimuli (Lamb, Steak, 
Beef, Pork, and Tongue) together with the ten composites of these five, АП 
possible pairs of these fifteen stimuli Were used, except those which involved 
the repetition of the same stimulus. In other words, items of the form “Do 
you prefer Beef and Lamb, or Beef and Pork?” were omitted, since it was 


felt that some persons might judge one composite against the other composite, 


HAROLD GULLIKSEN 139 


while some might decide to ignore the common element and simply give а 
comparison between (in this illustration) Lamb and Pork. We also omitted 
items of the form “Do you prefer Steak, or Steak and Tongue?” With these 
omissions, the total schedule was composed of ten items of complexity 2, 
thirty items of complexity 3, and fifteen of complexity 4. These items were 
randomly interspersed in the schedule given to the subjects. 

In the directions to the subjects, it was stressed that if two main dishes 
were presented in a choice, each was an ordinary-sized serving, and that it 
was to be assumed that both must be eaten, thus giving twice as much as if 
only a single dish were chosen. 

The responses to this schedule were analyzed to determine which of 
several laws of value increase were in agreement with the data and the results 
reported in [3]. Here we are presenting another analysis of the same data to 
see if the complexity of judgment influences the comparatal dispersion. 

The observed choices of the 92 college students are shown in Table 2, 
the corresponding normal deviates and relevant sums in Table 3. The in- 
complete data method described in [2] for constructing matrix M and vector 


TABLE 2 
а ааа 
sayn BRB E. 


PSYCHOMETRIKA 


140 


otro огт 009°T 

65€ 6г'т - O19°O -- OTST 062'2 
oz‘ == > =. 062"0- -- -- Ot og 
oLó't 06'0 GUI -- oco oto  -- o&'t 
00570 = ей — o о ~ -- 90° OLO” -- бото =- от 
0f0'T- -- 0%°0- 000 ost bis 
009" T- огт°т- OU'O- -- -- 060 
Oth" 2- o'o- =- oTo- 0670 -- 
060" > Oz0'2- oog T- — ooe = 

ого°ё- 020°2- 0°0- 


060*1 -- =“ 062'0 -- оо  -- огт°т 0160 O202 O20? т soe се 
OSL" 7" OLo'0-  -- O0F"0- -- 0640 otko  -- 009°Т ozo'z ют Ww] 6 
O'T- бгл = 77 00°0- otho- 0000  -- ogo  -- 068°0 о чш LOT 
06©'6- 009'Т- OTS'T- O*g^l- == =-  O6q'0- =- 06Т°0- оңг'0 es qu о тю 

ото°от- | oóz'z- oóz'z- Оң8°1- 0бс"т- of't- -- o'o- -- == ы loi °° e 


e eH sa 51 TI 54 ad SL м в ч dL 
2 


سے 


(suorirodoid ‘Ato шолу) saouwisiq sninur3g-:23uI тезчәштләйхд - saouauara1g poog 


€ TIVE 


REEREBRR AQ” 


Bowman 


x and z;, (the elements of the vector 


Z was used to construct the F matri 


Z.) shown in Table 4. 


141 


HAROLD GULLIKSEN 


060°L 
OLS 
on 
©%`6- 
oto'ot- 


o 


eon coo oO 


Ozg*2T- 096° L>” 


о 


og'i Ot*ó't- 000°т- O90'2 


оос оң 00т'8 
о о о о о о 


“ 


1 '103294 ayy pus “4 'xji3uW oui 20у sof ƏL 


ч TEL 


огі'тт 


о 


Ogg yI- OOG'g- 0%°т O2L'g OOUSl 


о 


о 


о 


о 


Wee 


ёз 


142 PSYCHOMETRIKA 


The thirty judgments of complexity 3 were scaled, giving scale values 
for all fifteen stimuli, including the scale values of the five unitary stimuli, 
designated з, (з) , and the scale values of the ten composite stimuli, designated 
Seca) . From the judgments of complexity 2, scale values, designated sum › 
for the five unitary stimuli were obtained. Thus for the five unitary stimuli 
it is possible to compare directly the scale values determined from judgments 


of complexity 3, sua) , with the scale values determined from judgments of 
complexity 2, s, , as shown in Figure 1. 


Scale Values | i Dp 1.22 S| (theo) 
Complexities 2 (5, ) vs 35,) awn 


/ 


FIGURE 1 
Scale Values s; vs, 83 


i Correspondingly from the judgments of complexity 4, scale values, 
designated ne ‚ сап be determined for the ten composite stiili and com- 
pared with the scale values, з, (зу , determined from jud m lexity 
3, as shown in Figure 2. ешш OF comp 

If the comparatal dispersion for the judgments from which the suc) 21€ 
computed are the same as the comparatal 


dispersions for the judgments from 
which the s,s) are computed, then the standard deviation of honda values; 


| 22 (Siu) — g* 
Li se i=l 
n—1 Р 


5 


[i ' 
- 4; i. пе 
а Ee: 


мм} г 
SS еа і 


HAROLD GULLIKSEN 143 


would be the same f 
sa Or 8,2) and s, If a gi imuli 
viec рее сам à e» ANd Su + a given set of stimul 1 
x m npe H me determined from judgments with a wedge. be т 
sion, while scale values sz are determi d fadi en 
umi LP p 8 mined from judgments with a l 
al dispersion, then the standard deviati I be 
ee the sta ard deviation of the s, values wi 
ger than the standard deviation of the ss values, beosuse th Are dien 
asure. For example, a set of distances D4 
e a larger standard deviation than the 
In Figure 1 the solid line, 


теа serves as a unit of me: 
measured i 1 1 
с isured in centimeters) will hav 
same set istances i 

set of distances Ds (measured in inches) 


S cale Valves " seg 
Complexities & (э) ч» 3(5,) 3 Ж 435 (obs) 
Ve fobs: 


FIGURE 2 
Scale Values 8: VS. $3 


the scale values de- 


2 and those determined from judg- 
1.46, is an estimate of the 
spersion for judgments of 
dgments of complexity 2. 
is that the discriminal 


li will combine independently to give the 

ccording to such а possibility the 
2 would be Get sj; 
equal. Similarly, if the 


Sy = 7 
1.468, , shows the observed relationship between 


termi 
шы at from judgments of complexity 2 a7 
Quotient a mplexity 3. The slope of this line, 1 
Complexity tained by dividing the comparatal di 
For ad е, by the comparatal dispersion for Ju 
i DOG apis of complexity 2, one possibility 
comparatal К each of the two stimu 
Comparat, ispersion for the judgment. A | 
al dispersion for judgments of complexity 


or суо x 
three és » the two discriminal dispersions were і 
15регвіопѕ combine independently for judgments of complexity 3, 


144 PSYCHOMETRIKA 


then the comparatal dispersion will be Wo? + o; + в ог oV3 for "n 
equal diseriminal dispersions. That is to вау, according to this view a 
comparatal dispersion would be proportional to the square root of t 6. 
complexity of ће judgment. If the comparatal dispersions are proportiona 
to the square root of the complexities, then for swa and sya) pi 
comparatal dispersions are proportional respectively to 4/2 and V3. 
Using these as the unit of measurement, 8/83 = V3/2 = 1.22. The 
ratio observed was even larger, 8.0/8, = 1.46, as shown by the solid line in 
Figure 1. The individual stimuli can be identified from Table 6 since they 
are in order of magnitude of scale values. 

The Pitman-Morgan variance ratio test for correlated data [1, 4, 5, 6] was 
used to test the hypothesis that this variance ratio (1.46)? is equal to 3/2. 
The result was a ¢ of 3.9 with three degrees of freedom, which corresponds 
to à p between .01 and .05. Testing the hypothesis that tl 
equals one instead of 3/2 yields ¢ = 10.5 
near the .001 level since the 4 value fo: 
freedom is only 12.9, while the corresp: 


1e variance ratio 
for three degrees of freedom. This is 
r à p of .001 with three degrees of 
onding value at the .01 level is 5.8. 
Thus the hypothesis of equality of comparatal dispersions for Suta) and 
Su% is clearly excluded. The ratio of the squares of the comparatal dispersions 
is at least 3 to 2 and probably greater. It should be noted that since the two 
Sets of data are correlated, the variance ratio безот correlated data was used. 


Using the statistical test appropriate for uncorrelated data will give different 
(and erroneous) results (see [13], p. 26). 


Turning to the composit 


3 and by judgments of comp «c» = .93 as shown by the solid 
line in Figure 2. Each of the {еп composite stimuli can be identified from 
Table 6, since they are in the order given by their scale values. If the com- 
paratal dispersions had been proportional to the square root of the com- 
plexities, then for s. and Seca) the comparatal dispersions would be pro- 
portional respectively to 4/4 and V3. Using these as the unit of measurement 


we would find that Seo /Se = V3/4 = .87. On the other hand, if the 
comparatal dispersions had been equal, this ratio would h 


ave been unity. 
The data are consistent with either hypothesis, 'The Pitman-Morgan variance 
ratio test (in each case) gives a £ of about 0.7, which for eight degrees 0 
freedom corresponds to a p of about S. 

An indication of the relationshi 
paratal dispersion and complexit 
here by scaling the data separately and comparing 
stimuli. The Pitman-Morgan variance ratio test for c 
method for testing the correlated variance ratios, However, it is desirable to 
have a method of simultaneously solving for scale values and variance ratios 
to be certain that some sort of best fit to the data has been obtained. 4 
simultaneous least squares solution has been found, 


е stimuli as scaled by judgments of complexity 
lexity 4, 8/8 ra 


p between the m. 


agnitude of the com- 
y of judgment can be 


obtained as indicated 
the variances of the 
orrelated data gives ® 


HAROLD GULLIKSEN 145 
A Least Squares Solution 


In order to develop a least squares solution, regard the experimental 
matrix of differences in scale values, D;; , as partitioned into four submatrices, 


each homogeneous with respect to complexity of judgment, as illustrated in 
Table 5. For n unitary stimuli, 1, 2, *** , № let the scale value differences be 


TABLE 5 


nces of Scale Values 


Partitioned Matrix of Differe 


a = (8/2002 = 2 +" 


ee == 


Tecorded in submatrix D;;; of Table 3, these being from nies of Ex 
Plexity 2, The sanle value differences for the composite. E ced Din 
ЛЕР [where m = п + (n/2) (n — 1)] are recorded insu d. * rie 
of Table 3 pee t judgments of complexity 4- The judgmen J 
Plexity 3, Tit deei aih ipee mn and à composite Horum ae i E 

erences which are recorded in submatrix D, and D; eras D, havea 
M analysis the differences Dz have a weight t , the . to express all dis- 
Weight w, , and the differences D, have unit weight. In ea f ortional to the 
tances in imilay dm these weights should be dirent 9F 


146 PSYCHOMETRIKA 


comparatal dispersions or to the unit used. For example, if some iistamees = 
measured in inches and some in feet, then multiplying the latter by 12 w il 
express all distances in similar terms. Since the comparatal dispersions D 
proportional to the reciprocal of the standard deviations of the set of 2 

values, шз and w, are proportional to the reciprocals of S.) and 5.00 , (d 
spectively.-Thus we can allow the weight to vary with complexity of judgmen 
and so determine the effect of complexity of judgment on comparatal dis- 
persion. It should also be noted that the matrices D, , D, , and D, may E 
matrices of complete or incomplete data. Only the incomplete data case wil 
be considered here since it generalizes easily to the complete data case, and 


: H B . " x B : a 
since experiments with composite stimuli are likely to be incomplete dat 
experiments. 


Define E, the error to be minimized, as 


"m B = Y (v, Du, — s; +), 


where subscript g takes the values 2, 3, and 4 as indicated in Table 5. 


ш represents the unknown weights w, and w, and unity— 
the weight for D, . 
Di, represents the experimentally determined differences їп 
scale values. 
в; and s; represent unknown scale values. 
mi indicates that the summation is over the incomplete 
data (as many cells as are available). 
Minimizing 


DE. [Diis — w,(s; — з)р 
would present a direct comparison between the experimental value Di 


and the unknown ш and s. However, the resulting nonlinear equations 
present difficulties not found in the linear solution 


resulting from (1). 
Differentiating with respect to the 8i, 
1 aE س‎ 
(2) 8a, = 2 226 — — w, Dy), 


Setting the derivative equal to zero and Separating the terms involving 
w, from those involving w, , 


(3) MSs — 2s — We >; Dij = EH Dij, 
ini 


@ = 1,9, :-= т)» 


HAROLD GULLIKSEN 147 


and 
mus. — Ув — ш, S Din = È Dii; 
(4) TN jentl 


(i =n+1,n+2, °°" ,m). 


№, indicates the number of cells in the row (or column) containing observa- 


tions involving s, . 
Dtiacontinthie with respect to the w, , 


(5) Үй BH EG pt, e Din + ра). 
гак $ Yn 2, Рао + Si Рад) 


Betting the derivative equal to zero and separat 
from those —— Wi, 


isl jml 


ing the terms involving Ww: 


and 


(7) a > Y Din? = " ES Dı) = 0 = 


idni jenti PERA 
Designating the double summations by V, and Vs, respectively, (6) 


and (7) may be written 
(8) 5 (s 5 Din) = 0, 
w,V.— 2 »» 5; > 


and 


=п+1 


а 
f m + 2 equations in 


Di ; а tute а set о 
Equations (3), (4), (8), and (9) wy ш), Which may be designated 
sums from the 


m +9 
unkno exe у Se 
as t] wns (S1 , S29 а fine the following row 
ен 1e unknown column vector x De 1 

atrices D, , Dija , Djs , and Diis 


2a = — Din (i = 1,2, Sid 
А ЖОЛ ШҮ, 
Za = — Dai d 
4 P 1 
Zia = + = Diis 5 = bs d 
sem ). 
å za dsl d js 
Zi = Dis G 
з +2, з 


148 PSYCHOMETRIKA 


Define the column vector Z;. containing m + 2 elements as the transpose of 


CE TELLS › #(в+1)з › Ztn3 › 777 5 Zma , 0, 0). 
Matrix M is an m X m matrix constructed from D,;, as follows: 


put — 1 in each off-diagonal cell where data exist; 
put 0 in each off-diagonal cell where no data exist; 


each diagonal entry is m, , the negative sum of the nondiagonal elements 
in the row (or column). 


M is symmetric and each row and each column sums to zero. Matrix F is an 
(m + 2) X (m + 2) matrix constructed by bordering M with an m + 1 
column 


(tis: иби y t5 ёа ,0,0, --- ,0) 


Ц 
and an m + 1 row 


(2212 , 2233 ‚ *** 3224 ; 0; 0, еж +0), 
as well as an m + 2 column 


(0, 0, -- 


* 4 0, заа 2703424 у °°° TP 
and an m + 2 row 


(0, 0, --. ‚0, 22+14 , 2264 gr у 22 x 


The lower right corner is completed with the 2 X 2 matrix 


s 0 | 

0 Y, 

Using these definitions of matrix Р and the column vec d (3) 
t vectors X and Za. , (9^ 

(4), (8), and (9) may be written канон 


(10) FX = Z,., 
Since the s; are determined only within 


i an additive 
by setting some s; , say s, , equ 


constant, we may solve 
al to zero, deleting 


the firs of Zae 
and the first row and column of Р giving Моав 
(1) Fok, os as 
which has the solution 
(12) X,= i , 


An Iterative Solution 

An iterative solution for (10) may also be in 
mation, say P, , to the solution X. Such a fi 
obtained from scaling the results separately 


dicated given a first approxi- 
rst approximation might be 
from D, 5D; nnd D, , and then 


HAROLD GULLIKSEN 149 


computing i 
i es the b: and w, as ratios of standard deviations, as was illustrated 
n a previous section. The derivative of (2) with respect to з; is 

i 


аз) aE к 
| "asi = т; . 
Likewise from (5), 
2р mij mij 
(14) 25 = Ð b Din = Vo- 


dure is to use the following correction to 


One possible iterative proce 
oximation, where x 


obtai ^en 
is Ves the (p + Dth approximation from the pth appr 
he solution to the equation f(x) = 0. 
Хы = Up — БЕ s 
x) are designated f(x) and j () re- 
Leydard Tucker as а 
f parameters given by 


eias and second derivatives of f( 

nidi ia This iterative procedure is suggested by 

R ation of the scoring method for estimation o 
ао ([7], p. 165). 

Wi the present case this procedure may be indicated in matrix form by 

n << row vector of reciprocals of the solutions given in (13) and (14), 


18) ye ы 1), 
п m,'m,' Um. Ve? Val’? 
P, = P, + N(FP, — go = Bet kNFE, . 
iis value seems to 


as unity unless tl 
convergence can be made smoother 


lue such as 3/4 or 1/4. 
alues and the ratios of standard 
ds were tried. The Gauss- 


кы атышу fraction Ё may be taken 
with ош fluctuations. If so, then 

: horter steps by taking Ё as some val 
deviati solving simultaneously for the scale v: 
Seidel ons for the present data, several metho і 
f method ([12], pp. 255-258) seemed to give the most rapid convergence 


Ог the present data. The final solution is given as column X in Table 6, 
,FX, and (23. 7 FX), thus indicating the 


to 
сов with the values for Zs 
ution for (10). 
m Table 6, column Y shows the solution for scale values when it has 
4 assumed that w, = 1, W: = 2/3 = -8165, or about 0.8, and Ws = 
Mode = 1.1547, or about 1.2. The values in column X approximate à 
ш. simultaneous solution for the 18 scale values and the values of w, and 
indi For both w, and w, the best value was smaller than the initial guess 
Sys above. In the present experiment judgments involving two stimuli 
Ну à elearly smaller comparatal dispersion than those involving three. 
ever, the change from three-stimulus to four-stimulus judgments does 


not м 
seem to affect the comparatal dispersi0™ 


150 PSYCHOMETRIKA f 


ABLE 6 


Solution fer Seale Values and Ratios of 
Y x 25. F. 2 - F 
Sclution given 
n (3) 
Br Tongue (T) -10.010 -.0012 
„а Pork (Р) 65 -5.350 -.0010 
1.043 Lamb (L) .9221 -1.340 0002 
1.746 Beef — (B)| 1.5844 5.150 0005 
2.197 Steak (S) 2.0111 1.090 .0012 
.000 T+P T 0.0000 -4.890 0025 
.270 T+L .22?6 -3.860 -8520 -.0011 
820 T +B „7615 -.910 -.9100 .0000 
8 Poi -.890 -.8902 .0002 
T«S .000 .0001 -.0001 
Р+В 1.030 1.0098 .0012 
Р+5 2.500 2.5002 -.0002 
+в 3.430 3.4304 = .0004 
L+s 4,470 4,4703 -.0005 
Bs 5.120 5.1210 -.0010 


- 2 


REFERENCES 
[1] Finney, D. J. The distribution of the ratio of estimates of the two variances in à 
sample from a normal bi-variate 


population. Biometrika, 1938, 30, 190-192. 

2| Gulliksen, H. A least squares solution for paired comparisons with incomplete data. 
Psychometrika, 1956, 21, 125-134, 

3] Gulliksen, H. Measurement of subjective values. Psychometrika, 1956, 21, 229-244. 
4] Kenny, D. T. Testing of difference: PER 


в between vari ariates. 
олай. 1. РЫП, 159, 7.95.28. lances based on correlated variate 
[5] Morgan, W. A. A test for the significance of the difference 

in a sample from a normal bi-variate population. Biome; 
6] Pitman, E. J. G. A note on normal correlation, Biometri, 
7] Rao, C. R. Advanced statistical methods in biometric research. New York: Wiley, 1952. 
8] Thurstone, L. L. A law of comparative judgment, Psychol. Rev., 1927, 34 273-980. 
9] Thurstone, L. L. A mental unit of measurement. Psychol. Rev., 1927, 34, 415-423. 
[10] Thurstone, L. L. Psychophysical analysis. Amer, J. Psychol., 1927, 38, 308-389. | 
[11] Thurstone, L. L. and Jones, L. V. The rational origin for measuring subjective valucs- | 
J. Amer. statist. Ass., 1957, 52, 458-471. 
[12] Whittaker, E. T. and Robinson, G. The calculus of observations, London: Blackie, 1929. 
[13] Yarrow, L. J. The effect of antecedent frustration on projective play, Psychol, Monogr- 
1948, 62, No. 6. 


Manuscripl received 7/1/57 
Revised manuscript received 9/25/57 


€ between the two variances 
trika, 1939, 31, 13-19. 
ka, 1939, 31, 9-19. 


PSYCHOMETRIKA—VOL, 23, No. 2 
JUNE, 1058 


APPLICATION OF THE QUARTIMAX METHOD OF ROTATION 
TO THURSTONE'S PRIMARY MENTAL ABILITIES STUDY* 


CHARLES WRIGLEY 
MICHIGAN STATE UNIVERSITY 
Davin R. SAUNDERS 
EDUCATIONAL TESTING SERVICE 
AND 
Jack О. NEUHAUS 
UNIVERSITY OF CALIFORNIA 


" This study compares a quartimax rotation of the centroid factor load- 
ings for Thurstone's Primary Mental Abilities Test Battery with factorings of 
the same correlation matrix by Thurstone (шше structure), Zimmerman (re- 
vised simple structure), Holzinger and Harman (bi-factor analysis), and 
Eysenck (хор faetor analysis). The quartimax results agree very closely 
With the solutions of Holzinger and Harman and of Eysenck, and reason- 
ably well with the two simple structure analyses. The principal difference is 
the general factor provided by the quartimax solution. Reproduction of the 
actorial structure is sufficiently good to justify its use at least as the first stage 
9! rotation. More extensive trial of the method will be needed with more 
varied data before it will be possible to decide whether quartimax factors 
meet psychological requirements sufficiently well without further rotation. 


Rotation in factor analysis aims at decreasing the complexity of the 
уап авалро of the tests or other variables being studied. It is con- 
loadin E D emen each such test factorially in terms of one or two large 
a а as many zero or near-zero loadings as possible. Since methods 

OVINE this kind of description have in the past been somewhat sub- 


fact 


puter e e Wish to thank Professor L. G. Henyey and the University of California Com- 
the Nati f " making the IBM 701 electronic computer available for this study, and 
Pro essor HE Science Foundation for its support of the work of the Computer Center. 
pm *. Kaiser of the University of Illinois has made helpful criticisms of the paper, 
he tabj, OMS S. Davis of the University of California has assisted with preparation of 
Ontraet, М € resenreh was supported in part by the United States Air Force under 
Research roe AF 33 (038)-25726 monitored by the Air Force Personnel and Training 
8 ispos lin ai Permission is granted for reproduction, translation, publication, use 
alin Whole and in part by or for the United States Government. | 
т sam for calculation of the quartimax and varimax loadings, prepared 


» 7. . Kaisi avails i e library of computer programs held by the 
d Mn ct Center at Oe oi E (program No. 464). Mr. J. O. Neuhaus 
Of Tingi 5- W. Dickman have prepared a quartimax program for Illiac at the University 
un в. This ае program will be usable on three other computers recently built or 
ii cj tion: Mistic (Michigan State University), Silliac (University of Sydney), 
Chine being constructed by Iowa State College. 


151 


152 PSYCHOMETRIKA 


jective and laborious, there has been rather general agreement that a more 
definitive mathematical procedure is required. A logical statement of the 
rotational problem is needed which, while adequately encompassing the 
psychological objectives, is sufficiently simple and precise to make possible 
an analytic mathematical formulation. 

The quartimax method attempts to meet these requirements. This 
procedure is one that maximizes the sum of fourth powers of the rotated 
factor loadings. The analytie rotational techniques recently reported by 
Carroll [3], by Neuhaus and Wrigley [12], and by Saunders [14], although 
developed from different lines of reasoning, all reduce in the orthogonal case 
to this simple maximization, so that the name quartimax may be applied to 
any of them. A similar fourth-power rotational criterion has been proposed 
by Ferguson [5]. 

Two practical advantages of this approach are easily seen. First, the 
procedure is objective; two investigators who start with the same data will 
secure the same results. Second, as a corollary of the first, the procedure is 
one that lends itself to machine computation. Quartimax results do not 
have to be the final solution. For those wanting simple structure, whether 
orthogonal or oblique, or some other preferred solution, further rotations 
can be made by conventional methods, starting from the quartimax loadings. 
Alternatively, the maximization function can possibly be modified better 
to meet psychological needs. The purpose of this article is to indicate how 
well quartimax loadings agree with factorial results already published. 

For this purpose a quartimax rotation has been made of Thurstone’s 
Primary Mental Abilities factor loadings [16]. In this study a battery of 57 
tests was administered to 240 college students, 13 centroid factors were 
extracted from the tetrachoric correlations, and 12 of these factors were 
rotated to orthogonal simple structure. Seven were identified with assurance 
and two others tentatively. The factors were given the follow 
Verbal, Spatial, Numerical, Perceptual, Memory, W 
Restrictive Reasoning, Deduction. The Primary Mental Abilities study is 
of great historical importance, since Thurstone supplied the first large-scale 
illustration of his newly devised factorial methods, This study also demon- 
strated the practicability of extending factor analysis to batteries of fifty OF 
more tests. It is hardly surprising, therefore, that the data have been repeatedly 
reanalyzed, including a bi-factor analysis by Holzinger and Harman [8], & 
group factor analysis by Eysenck [4] using Burt's method [2], and a revised 
simple structure solution by Zimmerman [20], which started from Thurstone’s 
simple structure loadings and aimed to improve them, 

While Thurstone’s study is historically important, it is a pioneer one 
and may be criticized in terms of present-day standards. The sample of 
subjects is rather small. Furthermore, the use of tetrachorie instead of 
product moment correlations not only means larger standard errors but also 
inconsistencies among the correlations, so that the matrix is non-Gramian 


ing names: 
ord Fluency, Induction, 


C 


CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 153 


оке д in the leading diagonal. The quartimax results reported in 
mallee oh ion are therefore only of limited value in study of the organi- 
is DAE ues "md The primary purpose of the paper, however, is 
агае. » ogica one of finding the extent to which the analytic quartimax 

10d of rotation approximates the standard nonanalytie procedures. The 
derived by different methods and 
ing orthogonal reference axes, 
d test of the extent to which 


existence of four prior factorial solutions, 
in terms of different logics, but all maintain 
makes the Primary Mental Abilities study a 500 
quartimax results approximate standard rotational ones. 


Procedure 

1 set of 13 centroid factors (Thurstone 
1 the IBM 701 electronic computer 
hine orders provide for successive 


The quartimax rotation of the ful 
rotated only the first twelve) was made or 
at the University of California. The mac 
pairings of factors in the order: 1, 2; 1, 3; «o jly m 2,8; "j 2,m;:j 
m — 1, m, where m denotes the number of factors. For each pair of factors, 


the machine finds the angle of rotation which will give the maximal sum of 
ansformed pair of factors and makes 


fourth powers of loadings for the tr 

the transformation whenever the angle is one minute or greater. A set of 

m(m — 1)/2 pairings will be called a cycle. Cycles of pairings continue until 
The procedure 


the fourth-power sum for the full matrix no longer increases. 

can be proved to yield converging values for the fourth-power sums ([12], р. 

83). There is generally a rapid increase in the fourth-power sum at first and a 

slower inerease later. Table 1 gives the fourth-p at the end of each 

Cycle. Although 15 cycles were required for со n decimal 
TABLE 1 


ower sum 
nvergence to seve 


Rate of Convergence for Fourth-Power Sum: Quartimax 
Primary Mental Abilities Study 


Rotation, Centroid Loadings, 
First dif- " ыў 
f 5 е. е oj 
= uc AN us - seta) incrédde 
powers 

0 8 " 7 8 

i 2: 3100 2.815804 т 

E 12.680711 .336611 9T. 

2 12:138563 1057792 EE 

: 12.152656 1018093 99.83 

à 12.756190 003534 92.9 3 

А 12.151299 .001109 99.977 

H 12.151121 -000428 99.921 

s 12.151905 .000118 99.998 

Я 12.151915 .000070 22: 

H 12.758008 1000033 99.999 
12.758023 .000015 109.000 
12.758028 -000003 
22 190032 1000002 
к -000000 | 


12.758032 


154 PSYCHOMETRIKA 


places, more than 99 per cent of the inerease had been realized by the end 
of three cycles. The entire solution required less than ten minutes on the 
computer, including reading cards with data into the machine and printing 
results. If the 99 per cent increase had sufficed, results could have been 
calculated and printed within two or three minutes. 

Because of the high speed of an electronic computer, it is feasible to 
calculate the full set of angles of rotation for each cycle. With punched-card 
equipment this is probably an inefficient procedure, since some rotations in 
each cycle will be very small and increments in the fourth-power sum very 
slight. Bolin ([1], pp. 234-239) has developed a technique to try to speed 
convergence by selecting pairs of factors where rotation promises to make 


the greatest difference. 
Results 


The proportion of high, medium, and low loadings in the various solutions 
will first be considered. The factorial results and interpretations in the five 
solutions will then be compared for the principal factors. 


Proportion of High, Medium, and Low Loadings 


Percentages of factor loadings of different sizes and signs in the five 


sets of results are summarized in Table 2. These percentages are based on the 


TABLE 2 


Percentage of Leadings of Different Signs and Sizes 


Positive Negative 


Quartimax 
Thurstone 
Zimmerman 
Holzinger & Harman 
Eysenck 


nine factors for the quartimax and Zimmerman solutions reported in this 
paper, the eight Thurstone factors, and for the eight factors together with 
the general factor for the Holzinger and Harman and for the Eysenck solu- 
tions. The quartimax method gives a higher percentages of high loadings 
than either simple structure solution, and at the same time more variables 
appear in the hyperplanes. These results are to be expected from the quarti- 


CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK O. NEUHAUS 155 


max logic of maximizing the dispersion of the squared loadings (or maximizing 
the kurtosis, as Saunders expresses the logic of the method). At the game 
time there are nineteen negative loadings of .200 or greater, whereas there 
are none in any other solution. Negative loadings have been contrary to most 
thinking about aptitudes, although Burt has argued for the meaningfulness 
of bipolar factors. 

The group factor anal 
have a still higher proportion of high an 
insure a large number of zero loadings, 
submatrices. On the other hand, loadings obtaine 
do not usually give as good a fit to the observed correl 
obtained by rotation of centroid loadings. 


yses of Eysenck and of Holzinger and Harman 
d low loadings. Group factor methods 
since operations are performed on 
d by group factor methods 
ations as do those 


Comparison of the Factorial Structures 

The main quartimax factors are presented in Tables 3-11. Factors are 
ordered in terms of their contribution to test variance (except that two 
factors in the perceptual area and two in the memory area are presented 
together), In each table tests are arranged in order of size of quartimax 
loadings. Alongside are the loadings for the best matching factors in the 
other solutions, Each table includes all tests with a loading of .400 or greater 
in at least one of the five solutions. This was the size of loading selected 
by Thurstone in his report. In the bi-factor analysis and the group factor 
analysis, all loadings are zero except for tests selected for inclusion in the 


factor, 
, General-Verbal Factor (Table 3). This quartimax factor agrees closely 
With Holzinger and Harman’s and Eysenck's general factors. There is no 
e solution. 


general factor in either simple structur | 3 к 
'The group factor methods provide а Verbal Factor in addition to the 


general factor of “intelligence.” In either simple structure solution, P 
Verbal is the largest of all factors. The peculiarity of the quartimax results 


is that the Verbal Factor has merged with the general factor and E Ж 
Separate identity. If the quartim: pared with those 0 


ax loadings are com 1 Жы 
olzinger and Harman, they will be seer Пу higher for ver а 


п to be genera 
tests but lower for nonverbal tests. 
th The quartimax results do not therefore fully 
р 3 пе ва dispute about the general p d 
osition. Li inger пап and like Eysenc 
занн неа Thurstone and Zimmerman, one oe 
rather than two is sufficient to delimit the general-verbal area. Aroon Ta 
both the general factors of Holzinger and Harman and of Pe n А 
Verbal Factors of Thurstone and Zimmerman have been Шш Т 
Spatial and Numerical Factors (Tables 4-5). E ў ad Numerical 
agree substantially in their selection of tests for the Spatial a 


agree with either side in 
but represent à midway 
k, the quartimax 


156 PSYCHOMETRIKA 


TABLE 3 


General-Verbal Factor* 


Name of test Г Loadings 
| Quartinax Taurstone |Zimmerman | H&H a Eysenck 
: | 

60. Vocabulary (Thorndike) .932 .385 -676 m qui 
5. Reading II .B65 «506 7% .66 .652 
11. Completion . 826 pol «i 669 
10. Inventive Opposites -803 559 +62 649 
lh. Reading I +789 638 .56 55% 
16. Inventive Synonyms +761 478 -59 612 
58. Vocabulary (Chicago) E 163 32 398 
hl. Verbal Analogies 732 h59 .81 bak 
6. Verbal Classification 7 313 .82 Bi 
7. Word Grouping +723 478 «65 68 
57. Gramar -687 420 +63 683 
lh. Pattern Analogies EC] 2h8 лї TI? 
52. Theme «565 435 «5l. 533 
kO. Reasoning 656 465 .68 658 
h3. Code Words .653 352 06 258 
56. Spelling | .636 33 46 897 
42. False Premises 633 391 «Gh. 653 
55. Sound Grcuping «599 300 +70 707 
li. Disarranged Sentences «569 211 «66 657 
47. Initials +562 248 i51 527 
12. Disarranged Words .5%8 230 .56 605 
l5. Syllogisms .538 | .226 715 690 
21. Form Board 514 117 „бт 610 
9. Controlled Association hon 222 .27 223 
39. Arithreticel Reasoning 97 | „292 .68 683 
13. First and last Letters 495 172 де 537 
25. Mechanical Movements 3453 136 “52 585 
37. Number Series 450 032 «63 627 
l9. Word Recognition «838 -161 472 
35. Tabular Completion +420 056 «51 565 
28. Copying -.086 .58 515 
29. Areas +396 038 +58 561 
2%. Punched Holes .393 о 151 565 
26. Identical Forms .391 122 Al 418 
15. Anagrams +390 -.013 -39 437 
38. Numerical Judgment .390 192 28 183 
19. Lozenges A -388 310 “5k 520 
30. Number Code +386 Бе .68 618 
50. Rhythm +376 БЕП ho hog 
34. Division +320 КИЛ .hó 461 
23. Surfece Development -308 21010 152 510 
18. Cubes +284 -.026 .51 495 
48, Number-Number -279 -.057 o 420 
8. Figure Classification +269 5.131 no Dm 
22. Lozenges ^ 2258 . -.003 53 .50k 
17. Block Counting -23 -.01l. | om sho +389 


i 
"Mares given to this factor by the other investigators: 


Generei--Eysenck, Holzinger & Harman; 
Verbal--Thurstone, Zimmerman, 


Factors. It is interesting that the four judgmental procedures and the analytic 
method agree in this way, suggesting that placement of axes in factor analysis 
has been a less arbitrary affair than some critics have maintained. 
Perceptual Factor (Table 6). The quartimax solution has two factors in 
the area, labeled Perceptual A and Perceptual B, each with two loadings 
above .400. The Picture Recall and the Disarranged Sentences Tests, 
grouped together in the Perceptual A Factor, also appear together in Zimmer- 
man’s solution as his Memory for Observed Relationships Factor, so that he 
also isolated two factors within this cluster of tests, whereas other investigators 


CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK О. NEUHAUS 157 


TABLE & 


Spatial Factor* 


- 


Name of test 
НЕН Eysenck 
— 
20. Flags .833 -636 -750 
22. Lozenges B .155 -633 +622 
18. Cubes 737 .626 .606 
21. Form Board .663 15 489 
23. Surface Development .659 .551 497 
7. Block Counting 5 3 159 
24. Punched Holes O45 .335 +53 
19. Lozenges A +642 E .512 
27. Pursuit .619 584 +555 
53. Hands 599 +455 T2 
pe Figure Classification 512 -393 d 
5. Syllogisms .507 +430 aa 
28. Copying 1680 +270 3 
30. Number Code ATL -109 
l3. Code Words 58 21 ons 
29. Areas MA .223 3% 
6. Verbal Classification Er 11 eS 
55. Sound Grouping .288 hi2 
M n 
Names given to this factor: 
Spatial--Thurstone, Zir an, Holsinger & Harman; 
Visuo-spatial--Eysenck 
TABLE 5 
Naze of test Loadings 
a | 


Quartinaz 


31 Multiplication 
32; ddition 7 Ёё do 
+ Subtraction 1698 A эй 
3%. Division D 8 g 
A Number Code 1681 n is 
Ls Numerical Judgment 68 Е 165 
x Tabular Completion "432 d dà 
7+ Arithmetical Reasoning ‘i 
"Homes given to this factor: 
Numerical--Thurstone, Zirzerman; 
arman, EyS? 


Arithmetical--Holzinge? 


ve .400 on the Perceptual 


E en ine loadings abo 
| ч мен а wi lzinger and Harman, have only 
Actor, whereas Zimmerman, and also Holzinger ? 


quen i i пе weight 0 
. < a case ш which t , if 
se i has four. This seems to be : * 

ibs t opi nion has been against Thurstone S analy 515. Zimmerman 


ily inter -ith the reduced 
a Perceptual Factor to be more easily interpreted w ith 


158 PSYCHOMETRIKA 


TABLE 6 


Perceptual Factor* 


Name of test loadings 


Quartizax A | Quartimax B Thurstone | Zirzerzan|H & H 

Picture Recall .562 .231 .5%5 

14. Disarranged Sentences -559 +008 261 
59. Word Count .221 +426 +360 
7. Word Grouping .206 .160 -573 

6. Verbal Classification -166 -3384 .537 

26. Identical Forms +136 .627 .603 
11. Completion +033 +082 22 
41. Verbal Analogies +090 -.058 07 
м. ‘tern Analogies .086 .%%2 435, 
60. Vocabulary (Thorndike) -.080 +005 м2 


"Nares given to this factor: 
Perceptual Speed--Thurstone, Zimmerman; 


Imagination--Holzinger & Harman; 
Classification--Eysenck. 


TABLE 7 


Memory Factor* 


Name of test Loadings 


Quartimax A| Quartimax B Thurstone | Zizzerran|H & H 


48, Number-Number 72 


46. Word-Number +436 s E 8 
47. Initials +350 +208 1487 im 
50. Figure Recognition| .332 139% 20 Е 
h9. Word Recognition БҮ] 437 1381 e 


“Name given to this factor: 


Memory--Thurstone, Zimmerman, Holzinger & Harman, Eysenck. 


loadings. Somewhat different names and interpretations have been given to 
the factor by the different investigators. Thurstone [17] isolated : ibe of 
perceptual factors in his factorial study of perception. The existence of two 
quartimax factors in the area does not therefore seem to be unreasonable. 

Memory Factor (Table 7). The quartimax Solution for a second time 
provides two factors in the area, labeled Memory А and Memor B. The 
second memory factor is a doublet including the two rec iion tests. 
Other investigators have isolated only one factor in the EO са 


` , DAVID R. SAUNDERS AND JACK О. NEUHAUS 159 


CHARLES WRIGLE 


Fluency Factor (Table 8). The Е luency Factor appears in the quartimax 
solution only as a doublet. Our results agree with those of Eysenck. Thur- 
stone and Zimmerman secured a much stronger factor. 

Induction Factor (Table 9). No Induction Factor was isolated either in 
the bi-factor analysis or the group factor analysis. In Zimmerman's solution 
the analysis of the reasoning area was appreciably changed, so that 
"'hurstone's Induction Factor was renamed by Zimmerman the Classification 


Factor, while his Restrictive Reasoning Factor became the General Reasoning 
erceptual area, it has proved difficult 


Factor. In the reasoning area, as in the p e 
to isolate clearlv defined factors. The quartimax factor has something in 


TABLE 8 


Fluency Factor™ 


Name of test 


Anagrams 

First and last Letter 
Disarranged Words 
Grammar 

Spelling 

Vocabulary (Thorndike) 


* 
Nemes given to this factor: 
Word Pluency--Thurstone; 
Letter Fluency--Zirmerman; 
Completion--Holzinger & Harman; 
Verba1-Linguistic--Eysenck. 
TABLE 9 


Induction Factor™ 


Loadings 
Name of test 


Quartirax 


Figure Clnssification mi D 
Number Series 5 
Pattern Analogies 
Tabular Completion 
Areas 

Arithmeticnl Rensoning 
Numerical Judgment 


* 
Names given to this factor? 


Induction--Thurstone; 
General Reasoning--Zimmerre? 


160 PSYCHOMETRIKA 


common with the Induction Factor of Thurstone and something in common 
i eneral Reasoning Factor of Zimmerman. 
Ti r elie: Pelr (Table 10). The only two highly-loaded tests € 
Sound Grouping and Rhythm. Holzinger and Harman's Rhythm Factor Ал 
Eysenck’s Audio-Rhythmie Factor comprise the same pair of tests. These 
are also the two tests with highest loadings on Zimmerman’s Classification 
Factor, although he found two other tests loaded upon the factor. Thurstone 
does not have a comparable factor. . 
Syllogistic Reasoning Factor (Table 11). This quartimax factor, like "e 
preceding one, is a doublet. All investigators agree upon this factor, ie 
in all solutions except Thurstone’s the factor is marked by only a pair e 
tests (False Premises and Reasoning). This is our reason for replacing 


TABLE 10 


Audio-Rhythmic Factor" 


Name of test Loadings 


Quartimax | Zimmerman 


55. Sound Grouping 
Rhythm 


8. Figure Classification 
45. Syllogisms 
25. Mechanical Movements 


"Names given to this factor: 


Classification--Zirmerman; 
Rhythm--Holzinger & Harman; 
Audio-Rhythmic--Bysenck. 


TABLE 11 


Syllogistic Reasoning Factor" 


Name of test 


Loadings 
Quartinax | Thurstone | Zimmerman | H&g | Eysenck 
L2. False Premises .580 .518 .629 58 1515 
40. Reasoning -529 .525 "608 E 1915 
25. Mechanical Moverents .132 оз 1328 Уе 
8. Figure Classificetion .121 .398 +226 E ME 


lames given to this factor: 


Deduction--Tnurstone, Zimmerman; 
Logical Reasoning--Holeinger & Harman; 
Relationsl--Eysenck. 


CHARLES W. 7, DAV: 
ES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 161 


Thurstone’s wi OW on 
ез wW М 1 
ider name of Deduction by the narrower one of Syllogistic 


Reasoning. 
Table 12 gives i 
le 12 gives the transformation matrix for the rotation from centroid 


to quartimax 1 
loadings. x loadings. In Table 13 appears the full set of quartimax factor 


AE " „ 
he cosines of the angles between the quartimax rotated axes and 


Thur oe 

Cesari simple structure axes [19] are: Numerical, .968; Spatial, .920; 

pw erbal, „879; Memory ‘A, .803; Induction, .750; Gyllogistic Reasoning, 
; Fluency, .560; Perceptual B, .515; Perceptual A, .496. 


Discussion 


PS quartimax method might be adjudged 
Sela first, that it provides à sufficiently go 
rotational solution to warrant its use in the initial stage as à step 


toward 
s the goal; or, second, that the quartimax results sufficiently meet 


Psychologi 

ogical requir 2 

Lei gical requirements to be taken as ends in themselves. 
us start by considering the degree of agreement between 


analys ч 
ses i эра 
yses. Three classes of factors may be distinguished. 


useful on either of two 
od approximation to the 


the various 


TABLE 12 
Orthogonal Transformation Matrix for Rotating 
Centroid Matrix to the Quartimax Factor Matrix 
(A) | 1G) F I AR SR 2 13 
= 
zo| 088 | oor | 95 -012 ©з | 
7 


-275 | -559 

-039 | -21 -032 | -223 172 | 3+ | -211 

-0651 | -232 187 548 nos 219 2127 
бо» | -161 -634 -160 | 023 ni 133 

35 | -212 | 31 da | 268 | 328 360 
161 | -326 -334 | 538 182 | -33% | -625 
369 | 362 281 | 07% oo. | 620 | 
354 | -150 188 | 532 -611 | -038 266 
зво | -125 396 | -065 | 289 =321 | 


М 
De 
cimal points have been omitted- 


h all investi- 
ubt as to the 
and Spatial 


1. . 
gators Clearly defined and easily recognized factors upon whic 
ests i di agree, and in which there is virtually no do 
m representative of the factor, €.£« the Numerical 


2 
ments tors which all investigato 
amount os the most suitable name, 
example, q variance attributable to 
numerical oes not seem to be as well structured as eit! 

al area, It will be recalled that Thurstone had nine tests W 


tend to isolate, but with disagree- 
ntative tests, and the 


the factor. The perceptual area, for 
either the spatial or the 
ith loadings 


rs 
the most represe: 


162 PSYCHOMETRIKA 


TABLE 13 


Quartimex Factor Loadings: Thurstone's Primary 
Mental Abilities Study 


Factor 
— | 
=== zl i 

Test| cv. | s | н | Pla) | xB) мл) | мв) ғ | I1 | a | sR | i2 | 13 
4 789 | -035 | -099 | -213 | -013 122 | -058 | -207 | -002 oru 095 029 113 
5 | 865 | 027 | -0% | -163| 033 | wo] 118| -209 | -095 | оё | -013 | o68| ©2 
6 | 725 | 334 | o1 | 266 | 33%) обо | -135| -097 | 165 | 172] -o20 | -об2| 009 
т | 7233| 128 | 051| 2% | 160 | -167 | -028 | -o7 | 081] 011] -окә | 22 | -052 
8 269 | 512 | oo) -052 | 205 | -o1 | -006 | 138] 475 | 197| 121| 087| -027 
9 | 498 | -173 | ооз) 009 | 037 | 008 | -œ6| 193 | -005 | oo1| -16 | o7] 515 
10 803 į -003 | 085 | 198 | -219 | -025 | -075 | 116 | -128 | -0o20 | -068 | 013 | 195 
1i | 826 | 133 | -095 | 093 | 082 | -012| -061| 136 | -151 | -2%6 | -222 | -076 | -023 
12 | 58 | 175 | 128 | 035 | oš | -оо1 | 180 | 339 | -013 | 2050 | -ox2 | -086 | -212 
13 | i95 | 167 | 090] -089 | -039 | -122 | -о13 | 506 | -о21 | о2о | ово | ودد‎ | 081 
us | 569 | 200 | 186] 559 | 068 | о58 | -озз | O17 | 034 | 181| озо | 053 | 057 
15 | 390 | 038 | 172} -032 | -055 | 100| oo| 518 | 082 | озі | -o32 | -102| 055 
16 | 761 | -018 | 115) оз -008 | 017} 243 | -033 | -o22 | 111 | 123 | -036 
17 | 230 | 605 | 102| 158 | -0%5 | -115 | -082 | -08 | ойт | -326 | -183 | 059 | 073 
18 28. | 737 | o92 | 117 | 153 | -097 | -143| 106 | отг | -009| -oo8 | -132 | -150 
19 | 308 | G42 | o31| -079 | -037 | -159 | 261| -22& | -075 | 087 | ودد‎ | ose | -®%3 
20 131| 833 | 137 | -032 | -013 | -059 | -066 | o21 | -197 | о5т | со8 | 097 | -182 
2 514 | 663 | -035 | -002 | -036 | -o2b | 297| 089 | 100 | -202 | -201 | -089 | 100 
22 | 268| 755 | 031| 088 | 035 | 201| от | -003 | -oœ8 | 152! 200 | -007 | 005 
23 308 | 659 | 035 | - | 107 | 261 | -o11| o5} | -097 | -o12 | o6 |-©8| 231 
2% | 393 | 645 | 020] -abı | 018 | oce | 327 | -052 | 2118 | -083 | mo | onm | 083 
25 | 553 391 | -o71 | -108 | 051 | -069 | 109 | 088 | 253 | Ius | 132 | -02 | -063 
26 | 391| 279 | -111| 136 | 627 | 056 | -065 | -013 | -037 | -0%6 | -107 | 08 | -030 
27 155 | 619 | 2% | ol0 | o21 | -16 | -199| 055 | 159 | 118 -090 | -099 | 212 
28 | hoo| ёо | 102| 108 | оті | љо | обо | -oo5 | 39% | -142 | 188 | -016 | 0% 
29 | 396 | W6 | 14 | -106 | -222 | 289 | o9 | -073 | 287 | -o68 | ооз | -116| ої 
30 3 471 581 130 | -041 103 001 022 015 | -180 027 337 | -039 
31 112 | 167 | 746) 029 | 075 | o9 | -018 | о8о 031 | oo | o90 | 227 | %5 
32 299 | От | 698| 095 | -142 | -150 116 O46 | -091 198 106 o2 108 
38 | 264 19. | 833 | 006 | ост | о58 | oki | or |-o36 | oio | -160 | -108 | -016 
3 | 220) 200 | 657 | -128 | 066 | 057 | -206 | -050 | 052 | o21 | обо |-2e3 | -132 
35 20 | 226 | h32 | -267 | -015 | 253 -035 | 011 | 288 | -026 | 066 | -03b | -04 
5 p92 | 251 | -003 | 23} | -327 | -053 | -137 | -2%6 | -o7 | -358 | 122 | 122 | -103 
37 | жо д1 | 336| OM | -os | 197 --o16 | -o9 | “use | “ove | 269 | 055 | 1% 
38 | i90 | 271 | 468 | -%5 | -ол1 | 098 | -oo6 | -273 | 108 | -102 | -029 | -34 | -109 
22 | ФТ] 367 | e| 036 | -a3 | 388 | -133 | -x76 | ise | 305 | 059 |2367 | 015 
ho | 656 158 | 055 | -153 | -158 | 028 | 125 | ооб | 102 | 2250 529 | oh2 | -072 
к | aae j gas | 060) 090 | -058 | -o97 | -087 | -057 | 201 | o3 | -09 | 018 | обв 
42 633 | 105 | os | -030 | -010 | 1% -056 | 038 | оті | -038 | 580 | -об5 | -024 
3 | $53] 458 | 286) -ob2 | orr| 369 | 030 | -b7 | 17 | oos -019 | 259 | -131 
hà 670 | 336 | -038| 086 | ох | -132 043 | -005 | 425 | -103 | -oo& | -o48 | -06 
43 538 507 RH = 959 198 -198 | 021 | 062 | 152 | os2 | 026 | 116 
9 3l o3 16 | -124 |- 158 
E E aro] -om | 350) EEE oa | a | S92 | ae 
026 | -o6: E = -028 
m i ns E 083 mem us di 01 029 EA yon 1002 
50 13 | 7250 | 332| 394 | 061 | -оот | ox "оф | -188 
51 060 | 137 562] 231 | O8 | 088 | -100 | o «om | SO -018 
EAE IEEE IE IE TE IEEE | “Gor IC AE IE: 
53 - -019 | 157 | -065 | 107 |. 2 -125 
5% 316 i е -0% | -091 | -012 | o36 | 097 | отб ced Ed E 200 
22 | Gla] am] anl ses | 1021 | 98% | de | 538 | on | oso | -055 
B - = = E 259 |-09% | 032 | 165 |-3%1 | -002 

51 687 | 090 | 167| o2b | -250 | -o0 | -096 231 | o x 
$8 | Т? | -086 | 020) -259 | 131 | -29 | 136 | 13b E on ep aus 203 
59 263 | -101 | 264) 221| 426 | -023 | 031 | -125 | ощ -032 | 075 | -o71 | 136 
бо | 932 | 030 | -019) -O40 | 005 | 058 | =o? | oor | -003 |-i35 | -000 | Ат | з 

|uare4 
Eus һ5.2%5 | 7-460 puer i oae 1.159 |1.083 |1.639 1.517 lion 1.302 h.165 [1.13% 
JA l- 


above -400 on the Perceptual Factor, but Zimmerman, also aiming at isolation 
of a simple structure factor, found only three. The complexity of the area 
was revealed in Thurstone’s subsequent intensive analysis [17] of perceptual 


— 


CHARLES WRIGLEY ў 
5 WRIGLEY, DAVID К. SAUNDERS AND JACK О. NEUHAUS 163 


tests. TI 
sts. There seems to b imi i 
; 8 e a similar difficulty i idi 
I ot а culty in deci 
I papse of the Fluency Factor ce 
3. Factors teas by s 3 i 
Vx LE үне ош һу some investigators but not by others. Th 
л a ustrations in this paper. Thurstone’s Induction Factor did 
i immerman’s reshuffle but reappears in the quartimax solution 
p a jon. 
Analogies Factor, based on the Verbal 


Holzinger 
olzinger and Harman identify an = 
Analogies Tests, but with very small 


Analogies 
ir n P an Words, and Pattern 
adings. The s: re 
same three tests appear together on Zimmerman's Eduction 


Factor wi ; : 
"er dins Ap кы 2 ч a solution, the three appear 
нау арт E cu я ace erbal F actor. 
бөй. paien Е Eia ulti 9 ог methods has indicated the very compli- 
di DO Ti an maroon dn tests. Guilford [6] has found the necessity 
SiGe RA E o er-inet easing number of factors as his researches have 
attempt E Аф se а" [7] has invoked new mathematical models in the 
ihat ons рй йош = ан patterns more closely. It is hardly surprising 
P иб дын ГЕ pr m revele soie of the linkages (i.e., factors) and 
cite Jn rs. The quartimax results provide an interesting example. 
The ima ators had reported only one factor in the memory area. 
ax provides two. The second factor represents the fact that the 
a higher correlation with one another than with 
f correlations of the five memory tests 


twor ui 
fhe recognition tests have 
> reca sts, 1 i 
is owes tests, i.e., the submatrix 0 
; of rank one. 


The Pri gis лух 
ле Primary Mental Abilities tests сап therefore be classified in various 


Wavs: 
зн йна a" advantages and disadvantages in each system. There has 
archical sime dnd in the past as to whether it is better to have à hier- 
anization of factors, 85 i characteristic of the Eysenck and the 
or & coordinate organization, as is to be 
ch as those of Thurstone and Zimmer- 
en the two iypes of classification 
is in fact rather little dis- 
d in the battery. The main 
various factors. 


ethod provides а 


erences Бебе 
tly, since there 
ors represente 
sis to be attached to the у 

artimax m 


'The dra 

Sufficier It matter of concern is W 1 i 

itly reasonable rotational approximation to warrant its use in the 
i Except for the absence of the 


reproduces Holzinger 
doublets. Eysenck's 


results ar 
але form 9 fairly similar. t : 
Simple MERE of classification, agreement 1> t quite as goo 
ате represent ve solutions. Nonetheless, e£ t 
emphasis ra in the quartimax results. 
. Thurstone’s Verbal Factor 28 
rs are reduced in size 


and th 
e Perceptual and Word Fluency 


164 PSYCHOMETRIKA 


portance. The quartimax results agree on the whole somewhat better with 
Zimmerman's revision than Thurstone's original solution. 

The proposition that the quartimax results sufficiently meet psychological 
requirements to be taken as an end in themselves is obviously harder to main- 
tain. For many the main deficiency in the quartimax solution will be the 
presence of a general factor. Obviously a quartimax solution is not a simple 
structure. 

The presence or not of a general factor in a quartimax rotation depends 
upon the particular set of correlations. (See the examples in [12].) Because 
the sum of squares remains constant in orthogonal transformation, the highest 
fourth-power sum is attained by concentrating variance for each test into one 
or as few loadings as possible in each row of factor loadings. That is to say; 
the objective of the quartimax method is to get the simplest possible factorial 
description of each test, and this is attained by finding the nearest solution 
to unifactoriality (each test loaded on a single factor) which is achievable 
by orthogonal transformation. The quartimax function places no restriction 
upon the distribution of variance by columns. If concentration of variance 
occurs largely in a single column, i.e., with respect to a single factor, we 
get a general factor. Otherwise we do not. (The maximization function is 
performed at present on pairs of columns, but this is merely a computational 
necessity because no solution is currently available for the nonlinear equations 
resulting from maximizing the full set of loadings simultaneously, [12], p. 82.) 

The general factor has been the subject of considerable controversy. 
Spearman [15] criticized Thurstone’s own solution in this study because of the 
absence of a general factor. He argued that a general factor was entirely 
reasonable, since this provided statistical representation for the predominantly 
positive correlations among the tests. Some others agreed with Spearman. 
Neither Holzinger and Harman nor Burt could see any psychological objec- 
tion to a general factor, and both the group factor method and the bi-factor 
method specifically provide for one. 

The issue for Thurstone, on the other hand, was not so much whether а 
general factor exists as that of the best way of portraying it for scientific 
purposes. He argued that a general factor lacked the invariant property of 
group factors, i.e., that group loadings would change less when further tests 
were added to the battery. Hence his second criterion for simple structure 
eliminated the possibility of a general factor by requiring some zero or near- 
zero loadings in every column, even if oblique axes were needed to achieve 
this. But he regarded the issue as still open. He wrote: “The newer methods 
(the multiple-factor methods) leave it as a question of fact whether a general 
factor is in the battery and whether it is an orthogonal general factor or & 
second-order general factor” (118), p. 273). And the acceptance of second-order 
general factors indicated a willingness to accept some form of hierarchical 
organization among the simple structure factors. | 

At least four lines of action are possible in view of this tendency for 


CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 165 


э ge ipea to give а general factor: first, to regard the quartimax 
nete. а di on = an approximation to the desired rotation and to complete the 
poor mena ically; second, to regard the general factor as evidence for 
st uos ection and as an indication that the data should be reanalyzed 
bicis ei uced set of variables; third, to accept the general factor as psycho- 
jn нене d d reasonable, as Spearman and others would have done; fourth, 
MED t е maximization function in the attempt to get rotational solutions 
briefly nem more closely to simple structure. Each alternative will be 
iscussed. 
imax loadings as only an 


1. The first possibility is to regard the quart 


а oximation and to make final adjustments, including elimination of the 
neral factor, by plotting graphs. For example, the variance of the quartimax 
rotation with the bipolar Audio- 


r^ кетү factor might be split by 
dis mic Factor, or with the unreported Factors 12-13. Cattell has followed 
sity arte with quartimax solutions he has obtained from Illiac (the Univer- 
soluti llinois computer). This may eventually prove to be the only feasible 
с for the dilemma, but it is one that should not be accepted until other 
M e have been closed. The computational disadvantages of graphical 
suff iis are evident. Even more important, subjective rotational solutions 
defin: from the disadvantage that it is difficult to secure agreement upon the 
or is solution. For example, i$ Thurstone’s simple structure solution 
Me immerman's simple structure solution to be preferred for the Primary 
mo Abilities study? So long as application of the techniques remains à 
уб ter for the investigator's judgment, there will be no assurance that any 
fait persons starting with the same data will reach the same results, and 
ap bs analysis will remain an arbitrary affair. The objectivity of an analytic 

wr is a virtue not to be discounted lightly. кей 

"There A general factor might be regarded as evidence for poor test aue 
е may be too many complex variables in the battery which carry 
portionately many 


üppreei dk 
tes: eciable loadings on more than one factor, or dispro ar 
5 drawn from some areas relative to others. For example, too many verba 


tests may have been included in the Primary Mental Abilities study. The 


i т. quartimax solution could be used to identify over-representt ps 
Puri factorially complex tests. The data could then be reanalyze У h 
се Set of variables in the hope that the general factor pieni Ded 
кере Achievement of a solution without а general factor % ou 4 : iem 

ut experimental design and attainment of a representativo өөү ка s; 
Conor нт اا‎ put seems hazard pions rather than 
| lowing раан d uA p m га я 1 findings. Spearman 


tried 1 i ts of test 

t intei р uirements 0 

Selectio, maintain his Two-Factor Theory b3 1 Suus eventually 
n, but it was a futile effort. 


с The existence 
ir not be denied. 7 
- The problem of the general factor is often constru 


ed in terms of black 


166 PSYCHOMETRIKA 


and white; we should always aecept a general factor or never do so. But when 
the experts disagree, so that a good case can be made both for and against, 
may not the flexibility of the quartimax method, sometimes yielding a general 
factor and sometimes not, be a desirable feature which will help in finding a 
way around the impasse? The quartimax method is effectively an appeal to 
parsimony. The aim is to find the rotation with the simplest factorial deserip- 
tion of the tests, in the sense that each test, has appreciable loadings on аз 
few faetors as possible. When a general factor inereases this parsimony, the 
general factor is accepted. This will usually enable test relationships to be 
stated in slightly fewer factors than in a simple structure solution, and in 
terms of a higher proportion of zero or near-zero loadings. But when a general 
factor does not contribute to parsimony in the solution, it is rejected. 

4. The fourth possibility is modification of the function for maximiza- 
tion. А very promising modification is that of Kaiser [10] in his varimax 
procedure. He has inverted the quartimax logie. Instead of aiming at in- 
equalities in the rows of loadings, his concern is to get maximal inequalities 
in the columns. His logic is to get a simple and distinctive account of each 
factor, with some tests highly loaded on it and some not. He accordingly 
maximizes the sum of variances for the individual columns of squared loadings 
rather than the variance for all squared loadings considered together, as the 
quartimax function does. Kaiser finds the varimax results to correspond more 
closely to simple structure than do the quartimax. Although there are slightly 
fewer small loadings there is a better distribution of them, in terms of simple 
structure concepts. The examples he has reported [11] usually meet Thurstone’s 
five simple structure criteria. So long as varimax solutions are found to meet 
the simple structure criteria, it might be reasonable to accept the varimax 
rotation as the definitive simple structure. 

A current problem, therefore, is to decide between the quartimax ro- 
tational model, which resembles more nearly the hierarchical organization 
favored by Burt, and the varimax model, which resembles more the coordinate 
organization adopted by Thurstone. How are we to decide which is the more 
suitable psychologically? 

Ideally, rotational solutions should meet the four requirements of: (a) 
objectivity and uniqueness, (b) meaningfulness, (c) parsimony of description, 
(d) numerical invariance of loadings for retained tests when further tests are 
added. 

Objectivity is one of the principal advantages of analytic solutions 
over graphical ones. By this is meant that another investigator starting with 
the same data and following the same analytic rules will obtain the same 
results. This does not hold for graphical methods. Thurstone and Zimmerman, 
both trying to apply the same simple structure criteria, reached different 
results. The objectivity of the analytic solutions has been stressed rather 
than the uniqueness because of a practical limitation of current computational 


CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 167 


о ш! he desirable procedure would be maximization of the entire 
vimm e "ear The present method, however, operates with pairs of 
кк оны E and poros ordering of factors is known to give slightly 
x LL Ш з (02), р. 82. That is, at present an approximate quartimax 
dies Ta 2 aleulated instead of the true one. But once a rule is made 
oc ho gr sin of factors, results become replicable for the same data and 
Le Lx ble for different sets of data. This objectivity, however, is 
v s: both to the quartimax and the varimax methods, so that it provides 

gr ound for selecting between them. 
Nes 2 the тешер of meaningfulness of much assistance in deciding 
е: ле two methods. The group factor method has seemed psycho- 
нец : y ecêp ie to Burt, and simple structure to Thurstone. Since the 
ut EU have arrived at more or less the same factorial description 

ability, it does not seem reasonable to assert that one makes good psy- 


chological sense but the other not. 
— [таша of parsimony, the advantage seems to rest with the quartimax 
ў E rather than the varimax. The quartimax line of reasoning is indeed 
уз of the concept of unifactoriality ([9], рр. 95-98). Tf tests could 
бө E expressed in terms of only one factor, there is little doubt that à 
ана orig solution would be the preferred one. With empirical data, they 
sim 7 be expressed so simply. The quartimax method aims at getting as 
wa e a factorial expression of each test as the data allow. Hierarchical 
ems of classification have in many branches of science been found to be 


Very e А ғ s 
Lorem and parsimonious. 
he quartimax method is principally vulner: 


invaris 0 : 
ariance. If Thurstone is correct that invariance с 
actor, and if inv: 


5 
hence structure but not with a general f. ; 
т A apes method obviously has to be rejected as а 
from: ps evidence at present for the invariance of 
attair a) Thurstone's box example ((18], pp. 36 
ШЕ in application of simple structure methods to 
bc recent development of analytic methods of rot а 1 
cisel e empirical studies of factorial invariance to try to speci y more ya 
thoes than hitherto the conditions under which loadings are invariant anc 
to si under which they are not. Since vari гез Its conform тоге closely 
in d structure requirements, it may be hypothesized that the Mere 
to th gs should vary less than the quartimax ones 2 additional tests area е 
е battery. Kaiser has carried out а preliminary investigation of this 
бен, using the 24-variable correlation matrix of Holzinger and Реа 
hypoth 30). For these data the varimax loadings are indeed more € ^ as 
istic esized. If this greater stability of varimax loadings proves с vo E: 
in f 9f a representative range of correlation matrices уатуш& шры ега * 
actorial structure, it will supply à powerful reason for preferring the 
› 


able on the issue of factorial 
an be attained with 
ariance is required, 
final solution. The 
f simple structure is derived 
9-376), and (b) experience 
a wide variety of data. 
ation seems to make 


168 PSYCHOMETRIKA 


varimax method to the quartimax. Although varimax is more stable than 
quartimax in the Holzinger and Harman example, it should not be inferred 
that quartimax results fluctuate greatly. They, too, are rather highly stable, 
and rank ordering of the principal tests defining any factor hardly ever 
changes. Indeed, there appear to be smaller differences in rank ordering of the 
principal tests with the analytic methods with varying numbers of tests than 
with Thurstone’s and Zimmerman’s Primary Mental Abilities simple struc- 
tures, subjectively obtained, although both are calculated for exactly the 
same set of tests. There seems a strong possibility that analytic methods in 
general will be more stable than the traditional graphical methods. 

Invariance has to this point been considered with respect to adding 
further tests to the battery. There is another problem of stability of factors, 
however, which seems so far to have received hardly any logical or empirical 
consideration. No generally accepted objective procedure is currently avail- 
able for determination of the number of factors to be rotated, and some 
investigators will extract and rotate more factors than others. Increased 
dimensionality seems on occasion to lead to splitting of factors. A factor with 
a large number of highly loaded tests will split into two smaller factors each 
incorporating a subgroup of the tests. The problem of invariance with relation 
to an inereased number of rotated factors has not yet been systematically 
studied, but it appears likely that the quartimax factors, because of their 
greater tendency to hierarchical organization, will be less likely to split than 
their varimax counterparts, and in this regard will be the more invariant of 
the two analytie techniques. 

A final consideration in comparing quartimax and varimax may be the 
number of negative loadings. There are nineteen negative loadings of .200 or 
greater in the quartimax solution for the Primary Mental Abilities study but 
only four in the varimax. In this respect, therefore, the varimax solution 
seems preferable to the quartimax. 

'Tosummarize these comparisons, present evidence suggests the quartimax 
results are the more parsimonious and may be the more stable when the 
number of factors for rotation is increased, and that the varimax results are 
more stable when the number of tests in the battery is increased and at least in 
the Primary Mental Abilities study have the fewer negative loadings. 

No attempt has been made in this paper to consider the problems of 
analytic rotation to oblique structure. The quartimax method generalizes 
differently to oblique structure according to whether the logie followed is 
that of Carroll [3], Neuhaus and Wrigley [12], or Saunders [13]. The only one 
sofa programmed for electronic computer is Saunders’ (for Iliac), so that 
we are not yet ina position to make any systematic study of the oblique 
situation. Kaiser ([11], р. 45) has developed a function for generalizing his 
varimax procedure to the oblique case, and the mathematies of the problem 
have recently been developed by Carroll [3a]. 


VU 


——ÀP «-——omm 
NÉE 


CHARLES WRIGLEY, DAVID R. SAUNDERS AND JACK 0. NEUHAUS 169 


an interim report upon the problems involved 
hod of rotation and upon selecting the one 
ements. For anyone with access to an 


. This paper presents only 
in trying to develop an analytic met 
best meeting our psychological requir 
electronic computer, the analytic methods presently available provide at 
least a rapid and objective approximation to the desired solution. For those 
favoring a simple structure solution, the varimax method appears to be the 
best starting point currently available, while for those preferring hierarchical 
organization of factors, the quartimax appears the more useful. Analytic 
methods should eventually help settle the long-standing controversy between 
protagonists of simple structure and of the general factor by making possible 
empirical studies to define more precisely the conditions under which in- 


variance, parsimony, ete., are best achieved. 


REFERENCES 


criteria of aircraft engine mechanics’ 
Western Reserve Univ., 1955. 


it. J. Psychol., Statist. Sect., 1950, 3, 40-75. 
ing simple structure in factor 


[1] Bolin, S. F. A factorial study of proficiency. 
Unpublished doctoral dissertation, 
[2] Burt, C. Group factor analysis. Br 
[3] Carroll, J. B. An analytical solution for approximati 
з analysis. Psychometrika, 1953, 18, 23-38. 
[3a] Carroll, J. B. Further notes on analytic simple structure solutions. Mimeographed, 
1956. 
[4] Eysenck, H. J. Review of L. 
educ. Psychol., 1939, 9, 270-275. 
[5] Ferguson, G. A. The concept of parsimony in factor analysis. 
19, 281-290. 
[6 Guilford, J. P. The structure of intellect. Psychol. Bull., 1956, 53, 267-293. 
[7] Guttman, L. A new approach to factor analysis: the radex. In Paul F. Lazarsfeld 
(Ed.), Mathematical thinking in the social sciences. New York: Columbia Univ. 
Press, 1954. 
[8] Holzinger, K. J. and Harman, H. H. Comp 
metrika, 1938, 3, 45-60. x 
[9] Holzinger, K. J. and Harman, H. H. Factor analysis. 
Press, 1941. | 
по Kaiser, Н. F, An analytic rotational criterion for factor analysis. 
цу 1985, 10, 438. (Abstract) 
on Dr = ха Lace me 
, Univ. California, 1956. 
12 аша, J. O. ced C. F. The quartimax Eod S p 
у appa ate e E „шй do simp structure, П: extension 
А p Jersey: Educational Testing Service Research 


L. Thurstone’s “Primary mental abilities." Brit. J. 


Psychometrika, 1954, 


arison of two factorial analyses. Psycho- 
Chicago: Univ. Chicago 


Amer. Psychologist, 


ral disserta- 


thod of factor analysis. Unpublished docto: 
ytic approach to 


1 ۴ 
13 уша, С. and Saunders, D. 
Bulle oblique solution. Princeton, 
[14] 8 etin RB-54-31, 1954. 
Pm. D. R. An analytic method 
cil Psychologist, 1953, 8, 428. С 
A nn C. Thurstone's work rework 
hurstone, L, L. Primary mental abilitie 


Univ. Chicago Press. 


of rotation to orthogonal simple structure. 


) 
. J. educ. Psychol., 
S. Psychometric Mono; 


170 PSYCHOMETRIKA 


[17] Thurstone, L. L. А factorial study of perception. Psychometric Monogr. No. 4, 1944. 
Chicago: Univ. Chicago Press. 

[18] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[19] Wrigley, C. F. and Neuhaus, J. O. The matching of two sets of factors. Amer. Psy- 
chologist, 1955, 10, 418-419. 

[20] Zimmerman, W. 5. A revised orthogonal rotational solution for Thurstone's original 
primary mental abilities test battery. Psychometrika, 1953, 18, 77-93. 


Manuscript received 11/29/56 


Revised manuscript received 8/12/57 


— و‎ 
e o ——— aga ———— 
eS ee 


% 


PSYCHOMETRIKA—VOL, 23, xo. 2 
JUNE, 1958 


A DISTINCTION BETWEEN EXACT AND APPROXIMATE 
NONPARAMETRIC METHODS* 


WILLIAM L. SawnEYT 


UNIVERSITY OF COLORADO MEDICAL CENTER 


tests are discussed in relation to parametric tests. À 
types of nonparametric tests. One type 
the other to an approximate significance 

led to confusion 


dist Nonparametric 
distinction is made between two 


Jende to an exact significance level, 
evel. The failure to distinguish between these two types has 


and error, Examples are cited. 


, The use of nonparametric statistics in psychology has increased rapidly 
during the past few years. This trend has been reflected by the inclusion 
of sections on nonparametric methods in recent psychological texts, and by 
the appearance of a text and reviews solely on the topic of nonparametric 
methods [3, 6, 15, 18, 19, 21, 22, 26]. 

The increasing familiarity of psychologists with these techniques is 
to be welcomed, since these methods may be extremely useful in handling 
Certain problems. However, in the light of а number of recent statements 
which have been made regarding, so-called nonparametric methods, it also 
appears that misconceptions about the nature of some of these techniques 
pu lead to inappropriate or at least to injudicious applications. The purpose 
9f this article is to clarify some of these misconceptions by calling attention 
to the essential characteristics of а nonparametric method and by making 
a distinction between two types of nonparametric methods. In addition, 
Certain precautions which should be observed in the use of nonparametric 


methods will be noted. | я - 
lies The basie distinction between parametric and nonparametric methods 
les in the fact that the latter are distribution free. In fact, they are some- 
times quite appropriately referred to as distribution-free s hods. ron 
d implies, they mini do not depend on à specified distribution = єз 
Opulation from which samples are drawn. However, m am на 


are not enti : me techniques aS 
Жү tirely without assumptions. Some techniques A. к 
distribution in the parent population. Other assumptions include independ- 


ee same shape of distribution of the parent populations, sampling without 
eplacement from a finite universe, and symmetrical distribution of the 


] meeting of the Rock 
y suggestions that greatly 


y Mountain Psycho- 


* 
logi Based on a ра ted at the annua 
logical Association, Salt presented Utah, 1957 ger for his man 
improve To ethor is indebted to Dr: John J. Conger OF 
he exposition of this manuscript. 


171 


172 PSYCHOMETRIKA 


parent populations. One should know the assumptions underlying the tests 
in order that they may be most advantageously employed. Thus, the Wilcoxon 
(Mann-Whitney) Test has been derived in two ways [8, 16]. Either way 
would lead to having “no ties.” Of course in practice because of “crude 
measurements” we often have ties. If ties appear we have a violation of the 
assumptions to some extent, and we no longer have an exact test. Unless the 
ties become a sizeable portion of the total N , then the exact tables are probably 
not too far in error. It should be pointed out that the Wileoxon Test has been 
derived by White [27], Mann and Whitney [16], Festinger [7], and Haldane 
and Smith [10]. It is also equivalent to the Kruskal-Wallis [13] where only 
two groups are considered. 

Like any parametric test, applications of nonparametric tests require 
that the assumptions underlying them are met. Certainly the assumptions 
underlying nonparametric methods are generally easier to meet—herein lies 
their advantage. — 

For a parametrie test in which the assumptions are exactly met, an 
accurate probability estimate may be obtained. Since this is also true of 
a number of nonparametric methods, accuracy of estimation is not a dis- 
tinguishing characteristic between the two types of tests. However, there 
are a number of tests, frequently referred to as nonparametric, which do not 
yield exact probability estimates; furthermore the relative accuracy of these 


tests may depend upon the shape of the underlying distribution. It is with 
regard to these tests, which we might refer 
many current misconceptions arise, 


In order to gain a clearer understanding of the basis of some of these 
misconceptions, it seems advisable to draw a distinction between exact 
nonparametric methods and the semi-nonparametrie methods. To facilitate 
an understanding of this distinction, Fisher's exact test for a 2 X 2 table 
will be considered. This test can be derived assuming independence of obser- 
vations within and between cells. The exact probability of a configuration 
of cell frequencies can be caleulated, given the marginal totals. Hence this 
may be referred to as an exact test. Another so-called nonparametric test 
advocated for use in this situation is the chi Square test of independence. 
It is well known, and has been pointed out most adequately for psychologists 
by Lewis and Burke [14], that chi square when used in this type of situation 
is approximate. Their article also lists several errors made in using chi square. 

Here then are two nonparametric tests for the same hypothesis, one 
which is exact and one which is approximate. An exact test is one in which, 
if the alpha level of significance is obtained, we are accurate and correct 
in stating the odds that this difference could have happened by chance. On 
the other hand, with approximate-type tests such as chi square, if we obtain 
or exceed the alpha level of significance we cannot be confident of accuracy 
in quoting the odds that such a difference could have occurred by chance. 


to as semi-nonparametrie, that 


„> 


WILLIAM L. SAWREY 173 


for a chi square may be 


Unlik : : 
nlike an exact nonparametric test, the true odds 
pending 


ore "en orale than the level of significance indicates, de 
accuracy of the approximations involved. 
EE one ET distinction between а parametric test and an exact non- 
tere af tà Е. seems relatively clear, the distinction between these two 
aed Т, 5 ae approximate nonparametric tests seems а little confused 
ШЫБЫ. ү X case of semi-nonparametrie or approximate tests, the dis- 
aes an of ps parent populations from which samples are drawn may be 
мо : ps ut these tests really depend on limiting processes for their 
m у. vercfore, they are not parametric tests. On the other hand, the 
of confidence obtained is not exact, and they cannot be called exact 


nonparametric tests. 
to mae of the wide use of chi squar 
Phenom e the ways in which failure 
Ap x nate | nonparametric statistics may 4 | 0 
агар аган the derivation [5, 12] of chi square from the multinomial 
Stirlin A lon indicates that three approximations are employed. These are 
ап Sena 5 approximation for factorials, an approximation similar to truncating 
nite series, and in essence the substitution of an integration for discrete 


Summation. 
Because chi square can be d 


e it will be taken up in more detail 
to distinguish between exact and 
lead to confusion and error. 


| erived without making any assumption 
А а. а parent population or апу assumptions about parameters of the 
à inomial from which it is derived, it has been called distribution free or 
parametric, However, it is clear that the more accurate the three approxi- 

Ea ee mentioned are, the more accurate the chi square test. e eod 
vai ese approximations depends on the speed of convergence O the ii 
ue. As Birnbaum states, “The x statistic becomes approximately dis- 
finite N, and little 


i i i H 
tribution-free for N — œ but is not distribution-free for 
n which its actual distribution for finite N and 


miting distribution” ([2], P- 435). Calling 
thout being careful to distinguish between 
d to confusion and error as сап 
and Lev: “Examples of 
been studied are the x^ test, 
the percentiles, ... If the samples normal population and we 
use а non-parametric test with leve e a, then for any parent 
oe whatever the probability of an error of the first Е ү 
thra x а or less than а because of discreteness” ([26], p 42 L ee ma 
State or chi square, which the above authors сай a nonparame » est, th ө 

To Ment is incorrect. The same type of conflicting statement occurs i 

un and Bush [19]. 

licor illustration of th 
between parametric an 


e confusion that may result from failure to 
c tests is found in 


d semi-nonparametri 


174 PSYCHOMETRIKA 


another recent text. After advocating the use of chi square as а xe e 
test the author states in a footnote, “Using а parametric tec nope 4 (pi 
reached the same decision. He used the critical ratio technique 5 "d 
ions the critical ratio depends upon t 
- Given normality and known population 
ds to an exact test, If normality is not 
; an exact critical ratio test is not possible. 
icates that population variances were not 
hat both of the above techniques n 
; When proportions are considered, the square 
of the critical ratio is actually equal to chi Square with one degree of freedom 
([15], pp. 227-228). 

In order to appreciate fully that a 
be confused with 
(1.е., when the as 


pproximate nonparametric tests may 
especially if used as an approximation 
t fully met), let us look further at chi 

em. There are severa] slightly different 
theorems showin i ай to normality of the 
apparent that the dis- 
hich samples are drawn, and to 
quite genera], The crucial factor 


› I had once Ог twice suggested to 
F test might serve аз àn approximation even 
Sand 0’s, Asa testimony to the modern teaching 
n Was received with incredulity, the objection 
requires normality, and that a mixture of 1’s and 

i agination b 


S, this Suggestio; 
being made that the F test 
O's could not 


» May be like] 


normal], However, if an 
™ error than the para- 


"ж 


r= 


WILLIAM L. SAWREY 175 


metri В Р * Э 
ерер cae ge T e 
Wk. decedere х е parametric method occurs when the 
Ms ad et and when it is also an exact test. Only in this case 
rien obtained level of significance be considered exact. This advantage 
Dt м = can be lost both by using approximate nonparametric techniques 
кр med E ا‎ nonparametric test when the underlying assumptions 
и. nonparametric methods that are exact have large sample ap- 
т ions given when the sample size exceeds that for which tables are 
Pon e. 15 ing sample approximations again depend on limiting 
AB rendi se their accuracy; as a result these large sample tests become 
pen ree pend tests, а fact which is frequently not recognized. Further- 
pes ix and Hodges [8] have shown they can be “subject to sizeable per- 
Fix ^ 2 errors." Apparently one even has to be careful of “exact tables.” 
А "o Hodges state that the Mann-Whitney Test tables by White [27] and 
Auble [1] “give significant probabilities in most cases with even less accuracy 
tt ' ([8], р. 301). Fortunately in the same article , 
р authors provide appropriate formulas and tables for caleulating the 
i probabilities, While one may often not err seriously by using approxi- 
mate techniques, he should be aware that he is making an approximate test. 
"s In view of the above discussion, it seems apparent that when an exact 
parametric test is available for testing a particular hypothesis, it should 


always be used unless the labor is completely prohibitive, the significance of 
etric test is applicable. In view 


the result is apparent in any event, or a param 

9f the faet that research on a problem often takes several or even hundreds of 
hours, it does not seem unreasonable to recommend that an investigator use 
АЩ exact caleulation that may take him up to an hour or even two hours. 
Computational methods for exact probabilities for contingency tables are 
given by Freeman and Halton [9]. These may become computationally 


laboriou А i i К n 
S, but if the research is sufficiently important and chi square gives 
д е rescate o substitute for such methods. 


than the normal approximation" 


i H 
ae approximation, then there 1s really п pen о 
urther, if an investigator has to employ an approximate 01200, e 
i d accurate one available. S es 


approxima 
ded to de 
]l as the a 


t "m 
ж l source of a test gives various 
Some В ee more research 1s nee 
tests о аата tests a 

conditions where the assump 


ompletely met. 
using an exact or an approximate 
al, whenever an investigator 1s 
i ta, Gamma, or 
7 chi square, F, t Bet: j 1 
d "then he is really using 
d even when 
х 2 сап be 


met, 
анто d ne rough guide is that, 
Normal aaa technique an 
Semi-no es are used to determine the | 
an exact nparametric test. Sometimes this proce 
ct test is available, For example, Fisher's E: 


176 PSYCHOMETRIKA 


used in place of the standard chi square test of independence, and also in 
place of the chi square used in a two group median test. Likewise, exact tests 
[9] are available for some of the nonparametric analysis of variance situations 
described by Mood [17], Wilson [31], and Roy and Mitra [20]. 


Further Considerations in the Use of Nonparametric Tests 


The power of a nonparametric test when compared to its parametric 
counterpart (when all assumptions are met for the parametric test) is always 
smaller. However, one must consider the fact that nonparametric tests are 
used when parametric tests are not appropriate because some assumptions 
are not met. In this case, it seems plausible, and in fact has been demonstated, 
that the nonparametric methods may in some circumstances actually have 
more power than the parametric methods [28]. In other words, a nonpara- 
metric test would be more likely to reject the null hypothesis. One can easily 
construct data to demonstrate this fact. 

Another related consideration which should be borne in mind is that 
nonparametric techniques most often test slightly different hypotheses than 
do parametric tests, as, for example, a median difference rather than a mean 
difference. If the distributions are skewed, these become definitely different 
hypotheses. Thus, one has to be especially careful to be aware of the exact 
statistical hypothesis under test when using a nonparametric technique and 
to relate his conlusions to this particular hypothesis. To illustrate, a non- 
parametric test may be sensitive to differences in shape, variability and 
location [24, 25]. Others may be sensitive mainly to location differences 
[13, 29] or mainly to variability differences [11, 23]. 

In conclusion, the following suggestions regarding the use of nonpara-^ 
metric tests can be specified. (a) Consider just how much error there may be 
in an approximate test. (b) Always use an exaet test when at all feasible. 
(c) Carefully consider the assumptions involved in order to determine whether 
an exact nonparametric test is possible. In some instances, this can only Þe 
done by consulting the original source. (d) Select the tests carefully in relation 
to the hypothesis being tested and any alternative hypotheses that the in- 
vestigator may wish to guard against. 


REFERENCES 


[1] Auble, D. Extended tables for the Mann-Whitney statistic. Bull. Inst. educ. Res- 
Indiana Univ., 1953, 1, 39. 

[2] Birnbaum, Z. W. Numerical tabulation of the distribution of Kolmogorov's statistic 
for finite sample size. J. Amer. statist. Ass., 1952, 47, 425-441, 

[3] Blum, J. R. and Fattu, N. A. Nonparametric methods. Rev. educ. Res., 1954, 24, 

67-487. 

4] uomen W. G. The comparison of percentages in matched samples. Biometrika, 
1950, 37, 256-266. 

[5] Cramer, H. Mathematical methods of statistics. Princeton: Princeton Univ. Press, 1946. 


| 


[6] 


29) w 


180] 


181] 


WILLIAM L. SAWREY 177 
А 7 


Edwards, A. L isti 
mer . Statistical methods for the behavioral sciences. New York: Rinehart, 
Festinger, L. The signi i 
; L. The significance of differences between i 
ае distribution function. Psychometrika, 1946, EN — idu 
гіх, Evelyn and Hodges, J. L., Jr. Signi fifties yo! 
. L., Jr. Significance b. ileox 

> nn. math. Statist., 1955, 26, 301-312. gen i сан 
"reema а { 
v ees G. H. and Halton, J. H. Note on an exact treatment of contingency, good- 
oe it, and other problems of significance. Biometrika, 1951, 38, 141-19. 

ane, J. B. S. and Smith, C. A. B. A simple exact test for birth-order effect. Ann. 


SUN. 1947-49, 14, 117-124. 
кышы) pe "n A two-sample distribution-free test. Biometrika, 1956, 43, 377-387. 
кау E T pel cow ced of statistics. Vol. 1. London: Griflin, 1948. 
7 duse bo rr ie E mar of ranks in one-criterion variance analysis. 
Dr hi e Wm C. J. The use and misuse of the chi-square test. Psychol. Bull., 
ана а Q. Psychological statistics. Ne 
Mann, H. B. and Whitney, D. R. On a t 
ашаа larger than the other. Ann. math. Statist., 1947, 18, 50-60. 
ыы, A. M. Introduction to the theory of statistics. New York: MeGraw-Hill, 1950. 
AOS, L. E. Non-parametric statistics for psychological research. Psychol. Bull. 
1952, 49, 122-143. ' 
proce] F. and Bush, В. В. Selected quantitive techniques. InG 
andbook of social psychology. Vol. 1. Theory and method. Cambridge, 
Wesley, 1954. Pp. 289-334. 
Roy, S. N. and Mitra, 8. K. A 
of analysis of variance and multiv: 
Siegel, S. Non-parametric statistics 
Hill, 1956. 
EN K. Distribution 
Soa Festinger and D. Katz iC 
Suki : Dryden, 1953. Pp. 586-577. 
ikhatme, B. V. On certain two: 
math, Slatist., 1957, 28, 188-194. 
8 wed, Frieda S. and Eisenhart, C. Tables for testing randomnes: 
ence of alternatives. Ann. math. Statist., 1943, 14, 66-87. 
bra X and Wolfowitz, J. On а test eg two samples are from the same popu- 
ation. Ann. math. Statist, 1940, 11, 147716% i; 5 
Заң Helen M. and Lev, J. Statistical inference. New York: Holt, peat OM 
Pad C. The use of ranks in 2 test of significance for comparing tw s. 
s aia 1952, 8, 33-41. 
hitney, D. R. A compariso 


on 

E e HAE distribution under no 
Sertati B эё 4 { 
: on, Ohio State Univ», 1948 methods. Biometrics Bul 


Vileox : : MA 

Sa nn F. Individual comparisons by ranking 

хохо, Е. Probability tables for individual comparisons b 
rometrics p 

Wilson, K, WA ibution-ree test of analysis of variance hypotheses. Р sychol. 


w York: Wiley, 1955. 
est of whether one of two random variables 


. Lindzey (Ed.), 
Mass.: Addison- 


-parametric generalizations 
1956, 43, 361-376. 
{езу York: McGraw- 


n introduction to some non 
ariate analysis. Biometrika, 


for the behavioral sciences. ? 


and the concept of power efficiency. 


ods 
e behavioral sciences. New 


(Eds.), Research methods in th 


-free statistical meth 
-sample nonparametric tests for variances. Ann. 


s of grouping in a 


ver of nonparametric tests and tests based 
d Unpublished doctoral 


n of the P' 1 
n-normal alternatives. 


l, 1945, 1, 


y ranking methods. 


Bull, 1956, 53, 96-101. 


M, an i 
bei received 7/18/57 
manuscript received 11/25/67 


BOOK REVIEWS 


Lee J. " 
. CRONBACH AND GOLDINE C. GLESER. Psychological Tests and Personnel Decisions. 
957, pp. xii + 165. $3.50. 


Urban: e rs Ee g ec 
rbana, Illinois: University of Illinois Press, 1 
a stir in mathematical statistics, economics, and 


Decision theory has created quite 
statisticians have found a comprehensive 


8 
Pide a of psychology. In decision theory 
both. rii d ter established statistical procedures. Economists have profitably used 
lationshipa д» чеч and its brother, game theory, in analyzing complex competitive re- 
of actual ж + ye 1ologists who hoped that these theories could serve as à descriptive model 
вот е уе havior have made less headway, since people seldom behave optimally as the 
а quire. The personnel psychologist, however, aspires to “optimum” behavior in 
Жору om and placement procedures, 80 his problems seem made to order for decision 
p onbach and Gleser hope to “stir up the reader's thoughts" about this possibility. 
mation pr process, аз viewed in the monograph, starts with some infor- 
invelücat out an individual and a strategy for using. this information to make either an 
Shores E ory or à terminal decision. If the decision is to investigate, e.g., get more test 
leri a alge A is then applied to the augmented information. If the decision is to 
reject, pos the individual is assigned to one of two or more treatments, e.g., accept or 
Ble of e result of the treatment is an outcome, ог payoff, that must be evaluated on а 
utility ae The problem is to choose the strategy that, yields the largest expected 
RENS . In the analysis it is assumed that the test information 15 available as a single score, 
is also d be a composite, or as & set of orthogonal factor scores called "aptitudes." It 
of aptit pus that for any particular treatment, the expected payoff is a linear function 
rom ude. Strategies are evaluated in terms of the increase in utility over that resulting 
the best a priori strategy. With these ground rules, Cronbach and Gleser proceed to 


analyz, 
Уле several special decision processes. 


] problem ith single-stage testing, “fixed- 


of selection Wi 
minated. In the former, the 


Кы considering the standard \ 
р апа "adaptive-treatment" procedures are discrimi н h н 
men are treated in а predetermined way, e.£o all are given à job with fixed speci- 

ay. In adaptive treatment, 


na predetermined w. 
ons or teaching methods, are specified; both 


lly, so that the treatment is adapted to the 
as a joint function of validity and 
stated, utility is a linear function 
t the authors use as & stick for 


x of forecasting efficiency. 
s of placement, classi- 


fieati 

кы т are admitted to a course taught i 

the men т hy epum e.g., job specificati 
Particular nd the treatment are chosen optima 
Selection men available. Utility increase 18 shown. 
of "i i for each procedure. Under the assumptions 
eating end in the fixed-treatment case. оп tha 
In а coefficient of asi a E mus 
eati ion to the single-stage ecision pr , ор s 
en and two-stage мнен] selection are treated in some detail. = an UH 

аге у ga are described for each; the effects of valid tio, аай eee 
К. esented. Some problems of optimum test length d; under the heading 
der the design and evaluation of 


he B. j 
andwidth-Fidelity Di , hors const 

an dern = g several different decisions. It is shown 

ompositely over all decisions to 


sts o 
at "nes batteries that are to be use zie 
Whigh, fone utili „да to bê оташа е ( isio 
nich it bis deed ue E dead vi ] validity coefficients. Utility also 
бакыл short, tests of di itudes (band- 


Provide 

8 es & criteri 

М ошо i гееп many 

dth n for choosing вае (Getty шу, the boo 
ing utilities. While this last problem 


) 
Restioris for one or two long, reliable tes em 
S far from a NEU outcomes, and ш eat jli measurement js no more 
Subjective а ved, the authors argue © and that utility has the 
Vanta; nd arbitrary than conventiona 
8€ of being explicit. 


a conclusi 


ity, selection ra 
are discusse 


gently 
lev: 


179 


180 PSYCHOMETRIKA 


The book is well organized and has many pertinent figures. Ample references are 
made to the relevant literature. Most of the mathematical development has been placed 
in a series of appendices, while the main text states the problems and discusses the results. 
Nevertheless it is a book for specialists, assuming knowledge of psychometries and personnel 
psychology, and requiring some mathematical sophistication, The book has been produced 
by photo-offset or its equivalent, but the unjustified right margins are not distracting; 
the equations and the text have been very earefully prepared. On the other hand, the small 
type and the soporific style combine to dull the stimulating effect. of the new ideas. 

It is clear that the authors view their book as more than a stirring rod—indeed 
they hope that it is the harbinger of a new test theory. Conventional test theory has focussed 
attention on the test score. Most of the chapters in Gulliksen’s Theory of Mental Tests 
are concerned with the properties, meaning, and interpretation of test scores. Cronbach 
and Gleser have focused attention on outcomes. Their extensive analysis of personnel 
problems is from a view point that may be characterized as “Validity for What?” Their 
success in dealing with a wide variety of problems in a single framework is impressive: 
Decision theory lends coherence to a diverse testing literature focused on outcomes. If 
enough professional testers manage to read the book, there is a good chance that the authors’ 
hopes for a new test theory will be realized, 


Berr F. Green, Jn. 
M.I.T. Lincoln Laboratory 


Jor К. Apaws, Basic Statistical Concepts. New York: McGraw-Hill Book Company; 
1955, pp. xvii + 304. 


At a time when mathematical statisticians are writing “know-no-mathematics” texts 
for budding research workers and any others who will read them, here is a first text written 
by a social scientist in which there is no pretense of avoiding mathematics, Adams states 
oe xd book: а) “to develop some basic mathematico-logical concepts of 
s stics, particularly the logic of statistical inference” and (2) “to develop an under- 
standing of the language used in mathematical statistics, including elementary calculus.” 
The book is intended “primarily as a text for a one- or two-semester course for students 
who have had little or no previous calculus or statistics.” Adams’ premise is “that the 
college student, whether oriented toward applications or toward mathematics, can best 
spend his time and energy in mastering some of the abstract concepts, i.e., mathematical 
models, and some of the mathematical language of the field.” 

: These are Kome Ghallenging ideas but first, just what has Adams done to implement 
this point of view? W ithout too much injustice, the main text of the book ean be divide 
into three sections: the first on basic statistical concepts (actually the title of the book), ® 
second exhibiting the notions of differentiation and integration within the context of con- 
tinuous distributions, and a final section dealing with normal, chi square, 4, F 
distributions and their applicability in statistical inference. s 

Most of the concepts peculiar to elementary statistics, with the exception of corre- 
lation and regression, are found in the first four chapters of the book. The approach to 
these ideas and the order in which they are developed are interesting and sometimes novel. 
For example, population is defined as a value funetion, that is a class of ordered pairs 
such that the second member of each pair is a member of a set and the first member of 
the pair is the value of that member of the set.” When the topic of statistical inference 
is developed, it is done using only finite populations. After some instruction in how to 
count using combinatorial methods, but before any formal consideration of measures of 

central tendency and variability, one meets tests of significance, confidence intervals, 
Type I and Type II errors, and factors influencing the power of a test. 


, and bivariate 


BOOK REVIEWS 181 


The vehicle for the i i istical i i 
Men heme npe ehe eee 
ne an unfortunate result. It contributes RE ig gans 

etween testing of hypotheses and determini E na vei aj bcc Mee 
сената: e шт - ses nining o confidence intervals. Adams uses the 
йе retur tO cote бай 4 pence evel interchangeably and nowhere in the book does 
be A to increase the а rapi nog SA: MA, Reisen 

vel 1 ` 

os meds m n a of the book the approach and style are much more terse 
ОЕ Hosen tiis р her ooks directed at students of the same background, It takes only 
ашды TE of the first chapter to set forth the “problems dealt with by sta- 
шае at F ме theorems аге stated formally and rigorously. Many theorems are 
rpm pages of text. More difficult theorems are proved in a 36-page mathe- 
boris T y = which deals with topics such as multiple integration, moments, Tcheby- 

Е m У, рше of chi square distributions, and bivariate normal distributions. 
ilis ш iet bowing to the fact that a student may know only the calculus which 
Fm € м it а few chapters earlier, most of the last section of the book reads like an 
ыу ЫЫ A ES in mathematical statistics. The treatment of normal, t, F, chi square, and 
Een : со is really much closer to what one would find in Cramér's new ele- 
cans wi ook, T'he Elements of Probability Theory, than it is to any of the most popular 
petitors for the market of undergraduate (or even graduate) statistics courses in 


psychology departments. 

ыз criticism of certain ehapt 

teria of Fe worked examples. Though 
he gamma function and ends wi 


ast section of the book is that they con- 
the chapter on ¢ begins with its distribution in 
th a criterion for discarding exceptional observa- 


tion: ? Б : 
nd nowhere in between does Adams work a single problem. Even the usually plentiful 
ises are slighted in this chapter. There is only one exercise in which two independent 


gro Pad ө 
ups are compared though there are two problems using related measures. In the chapter 
orked examples are а mixed-model problem and one show- 


i F distribution the only w 
e use of Tukey's procedure for comparing individual means. 
РРЖ. „а finishing this book опе is inclined to ask whether he might use it as а text. 
н зү ые Чопе many things well. Не defines terms with rigor. Many of the exercises are 
treat пе price of the book to а teacher of statistics. Technical errors are few, A fresh 
Eo is given to many topics. 
seê ior a then, is the book poten 
and, in ї would fit into the undergra 
ре ticular, psychology depart 
e too di uate majors in psychology 18 
4 ifficult for many students. 
tò mite” this critical comment is à ; 
emplifi ies, and in part it is à consequence of the par 
es this approach. Adams ties rigor and brevity. Т 
difficulty is that 


State 
боб a of definition and theorems. The Ч fag би 
extends i matical expository 1 h i 
s into the nonmathema eaP ires five lines and is completely void 


Point, are numer 

ou { the median 1 
vH discussion of Tus Edi any additional material would have bere wet 
ution the notion of the power of а test of significance. In the chapter on A is з 
апу dd. you have to go all the way to the exercises at the end of the pcs € 
in a b ea that regression and correlation relate to something more significan р 

owl, 

But what of the general approach apart from its exemplificati 
Bood premise that first training in statistics is best spent on mas 


ers in the 1 


swer is that it is hard to 
al science departments 
d statistics course for 
s’ book would 


tially useful? A negative an: 
duate curriculum of most soci: 
ments. The day of the require! 
here, and the simple fact is that Adam: 


ral theoretical approach 
ay in which Adams ex- 
4 are linked in 


the brevity at these points 
terial. Illustrations of this 


ence of the gene! 
ticular W 
hese should be an 


consequi 


on in Adams' book? 
{егїп mathematical 


Isa 


182 PSYCHOMETRIKA 


matical language in the field? It is too bad that the answer to this ques- 
e yr IM as екы аз pos of the proofs in the book. Certainly no answer n 
be given which transcends time and individual differences, but the point of view tal m 
here is that there are better ways of teaching elementary statistics in social science E. 
partments. Asking students to learn calculus (in a statistics course) so that they may bette 
understand statistics is a little like asking them to learn statistics in a foreign met 
The concepts of calculus must be pretty well in hand if they are to be of any real conceptua 
aid to the student of statistics. 
Another difficulty associated with this general approach is that it does not fit har- 
moniously with the growing philosophy that a course in statistics should be a part of a well- 


rounded education. So much emphasis on distribution theory and on calculus takes space 
and time from the uniquely statistical concepts and their wide applicability in society 
today. 


and differentiate with some ski 
differentiation, 

Neither the book nor any review is likely to change many opinions on how the first 
course in statistics should be taught. Despite the reviewer's disagreement with the author 
on this issue it is strongly recommended that those who prefer the mathematical approach 
consider this book, 


1 КовЕвт E. MORIN 
University of Texas 


Garrett, Henry E, Elementary Statistics. New York: H пу, 

ida York: Longmans, Green and Company, 

" This little book (146 pages, exclusive of tables) is a shorter and more elementary 

version of an earlier introductory text by the same author. 

on frequency distributions, averages, variability, : 

hypotheses, correlation, chi Square, and comparin 
The emphasis throughout the b 

of the pages is devoted 


/ but nowher, ‚ more 
testing hypotheses, » Where are they 


On page 97 the text reads, “The nul] 
that the true mean difference ебе; 


BOOK REVIEWS 183 


When sampling is considered in this chapter, the ex iv i i 
random sampling are really cases of more аек A EE X : 
M Three sentences from page 91 illustrate this problem. “Various devices, A zu 
е Which apply in every situation, have been employed to guarantee а random sample. 

n the problem stated above, for example, the experimenter would try to select children 
proportionally from all the elementary schools in the city, thus including all intellectual 
and Socioeconomic levels. When the population is on file (telephone directory or civil 
service list) every twentieth or even five-hundredth name might be chosen.” The problem 
of sampling is further confused by statements like the following one found on page 90. 

In order to infer from the performance of a sample (its M, for instance) what performance 
can be expected from the population, the sample must be representative of its population.” 
Were this true, of course, either all random samples would have to be called representative 
samples or we could not make inferences from any random samples. 

k Several other criticisms of this chapter might be listed. Critical ratio and і are used 
interchangeably. "Significant differences" are equated with "real" differences and are 
Presented as the opposite of differences of no consequence. Correlated percentages are 


compared as though they were independent. 

Even though some other chapters in this book are 
hypotheses, it is not possible to recommend this book fo 
80 many more suitable books are available. 


better than the chapter on testing 
r use in elementary classes when 


x s 
University of Tezas ROBERT E, MORIN 


W. Artex Warts AND Harry V. Roperrs. Statistics: А New Approach. Glencoe: Free 
Press, 1956, Pp. xxxviii + 646, $6.00. 

approaches used in this book. These include the use of 
how the universality of statistics, the teaching of ` 


Statistics as a field in its own right, the introduction of nonparameteric techniques, and the 
elimination of Student's ¢ and the F ratio from their accustomed place. The pleasing style 


and apparent freedom from numerical mistakes are also novel—or at least to be encouraged. 
egories: The Nature of Statistics, Statistical 


This book is divided into four major cat 
Descrip tion, Statistical Inference, and Special Topics. Each section includes about 150 
Pages, so the book is extraordinarily long—too long to be covered in a one semester course 
Ih statistics, 2 
Som The authors talk of the avoidan 
Bs Ө simple algebra is essential, however, 31 
e © up. Students who took and understoo! 
p тунан in this book; those who did по 
2nd minus signs, Е a 
easonal Patterns ^ Level, Lake Michigan-Huron. 1 he reviewer fo 
vany interesting АЖ sa, ped vehemently that the sheer number of ЖОО 2 ч 
› y 
"as down too much and made the book, particularly the first part, X да s d 
жамы is the numbering of problems, tables, and examples from the pag 
ich they are found A 
0 м : isti -nificance of the difference between 
ы Пи psychological дай e in to be learned. To teach this 


; i hods 

Ша ie fa i st, important of methods t s 

mois овы place after dot dard deviation, ono must piak e E oso the 
i er i 

differe rom Chapter 11, the null hypothesis from Chap , 


nce between means from Chapter 13. 


m There are really several new 
ual examples from many fields to 8 


sity, and, in fact, a virtue. 


nd after 200 pages of words, the numbers finally 
d high school algebra will have no trouble with 


t understand it get no magic release from 


ce of mathematics as à neces 


184 PSYCHOMETRIKA 


The null hypothesis is covered rather too completely for an elementary text, with 
operating characteristic curves probably out of place. Why analysis of variance is brought 
up at all in Chapter 13 is not clear—especially since the F test is played down. Correlation, 
on the other hand, which certainly is a subject which should be covered extensively in a 
course in psychological statisties, is one of the special topies. It is introduced as a ratio 
of the standard error of estimate to the standard deviation. No handy computational 
formulas for r are given. 

As may be inferred from the foregoing, this is no cookbook—nor is it a book where 
the occasional user of statistics can look for a formula or table. Perhaps the addition of 
an appendix with computational formulas and more tables would make this a more valuable 
book. 

This book seemed less than ideal for a one-term introduction to psychological statistics. 
It is an outstanding book for: (1) a full-year course in statistics, particularly an inter- 
departmental course; (2) learning statistics extracurricularly; (3) use as a source book 
for nonparametric techniques and for a wealth of examples. 

Caries L. Woop 
University of Texas 


JEROME 8. Bruner, JACQUELINE J. Goopnow, AND GEORGE А. AUSTIN. A Study of Thinking. 
New York: John Wiley and Sons, Inc., 1956. pp. xi + 330. $5.50. 


There are perhaps two possible reasons for reviewing a book in Psychometrika—one 
being that the work is an example or discussion of quantification in psychology, and the 
other being that the work is provocative for the use of quantitative methods and models. 
A Study of Thinking is the latter kind of work. It is a highly verbal book offering much. 
interesting material which might be incorporated in a rigorous approach to problem- 
solving behavior. 

At the outset the authors state: “The learning and utilization of categories represents 
one of the most elementary and general forms of cognition by which man adjusts to his 
environment. It was in this belief that the research reported in this volume was undertaken. 
For it is with the categorizing process and its many ramifications that this book is princi- 
pally concerned" (p. 2). “Categorizing”’ is defined as behavior involving the placement or 
grouping of objects or events on the basis of selected cues. A distinction is made between 
category (or concept) formation, which is “the inventive act by which classes are con^ 
structed," and concept attainment, which is the search for and testing of attributes of 
objects and events in order to distinguish different categories. For example, the work of à 
physicist in distinguishing between substances that undergo fission is not to form the con 
cepts “fissile” and "nonfissile" but to determine the attributes or properties that are 
associated with fissile and nonfissile substances. Concept attainment as analyzed by the 
authors consists of the following aspects: (1) There is an array of instances to be tested 
which can be characterized in terms of attributes, e.g., color, weight, ete, (2) As instances 
are encountered, a person makes decisions whether the sample before him is in one category 
or another. (3) Any given decision will be found to be correct, incorrect, or indeterminate. 
This is called “validation” of a decision. (4) Each decision provides potential information 
by limiting the number of attributes and attribute values that can be considered predictive 
of category membership. (5) The sequence of decisions made on the way to concept attain- 
ment is considered as a strategy which has certain objectives, i.e., to maximize the infor- 
mation obtained from each decision and test, to keep the “cognitive strain” (undue strat 
on memory or inference) within limits, and to regulate the risk of failing or other decision 
consequences. It js the description of behavior of this kind that is the task of this book. 

“То the reader conversant with contemporary American psychology," the authors 


— UA e وق و‎ 


"n 


BOOK REVIEWS 185 


state at the end of Chapter 1, “the book will appear singularly lacking in the more familiar 
forms of theoretical discourse. Neither the language of learning theory, of Gestalt theory, 
mor of psychoanalysis will be evident save in the form of incidental reference. For our 
objective has not been to extend reinforcement theory or the theory of traces or any other 
prepared psychological position to the problems of categorizing. We have not ignored the 
rich theoretical baekgrounds of contemporary theory. Rather, we have come gradually 
to the conclusion that what is most needed in the analysis of categorizing phenomena · · • is 
an adequate analytie description of the actual behavior that goes on when a person learns 
how to use defining cues as a basis for grouping the events of his environment" (p. 23). 
This statement reveals two major characteristics of А Study of Thinking. First, there is a 
de-emphasis of the use of available language which has the disadvantage of contributing 
another set of definitions and explanations in contrast to a contribution toward consolidating 
our present explanations. Secondly, the lack of systematic position often makes the very 
interesting variables that are considered seem like а stimulating pot pourri rather than а 
systematically generated set of variables for study. 

Three category types are distinguished. (1) Conjunctive—defined by the joint presence 
of the appropriate values of several attributes, e.g., in a set of cards containing combinations 
of varying values of several attributes, the category defined by number, redness, and 
circles (all cards containing three red circles) is a conjunctive category. Most uses of the 
Vigotsky Test and Wisconsin Card Sorting Test employ this kind of categorizing. (2) 
Disjunctive—defined by the presence of appropriate values of several attributes or any 
constituent thereof, e.g., the possession of three red circles or three figures, red figures, 
circles, three red figures, red circles, or three circles. (3) Relational—defined by a specifiable 
relationship between attributes that define a category, e.g., the same number of figures as 
Colors or fewer figures than colors. Following the statement of these definitions, Chapter 3, 
entitled “The Process of Concept Attainment,” is a rich and meaty chapter in which the 
authors are at their best in the sense that they present, on the basis of past work мш: 
own insights, what appear to be important variables to be investigated in ү b and 
ae attainment. The technique employed, mari " to look at examples y 

ay behavior and to analyze these situations in keen detail. Я | 

Chapters 4 through т, approximately 60 per cent of the book Li iei em 
appendix), are devoted to presenting «i. a series of several Lap кене ме Ed 
Concept attainment and concept utilization ... constructed to wid us үт mee ien, 
involved when a person learns to group discriminably different Ene nd pire 
Into equivalence classes and to recognize new members of these ¢ — ат 
ег (p. 80). The aim of the e c Е voee bg amens the experiments 
solving behavior and this is ingeniously ear ine NUR arc not always specifically 
z erred to are not completely rigorous ones; dep presented when group differences pro- 
ашаа, and tests of statistical significance 2 experiments provide a means of observ- 
Vide major findings for disc | hors perform an insightful sort 


Ing the perform ots on the b: 4 

nance of subjects on the е tas requirements. 

9f task analysis of important variables under different ta king suggests for quantitatively- 

T With repect to the leads which 4 Study of Ts ODE points. 

minded psychologists, the writer of this review submits ` p xi strategy using notions of 

1. The authors employ the concept of = яе Term theory analysis. Their m 

maximizi ike Y matrices analogous io 2 4 onditions that 

Vestigati ЧЕ ЧЫ nnd pes oft ma E s essentially of studying E > of theoretical 

affect ê St pude behavior re is the ley qe аре БЕ 
use of these ideal strategie? - be that a fruitful à de 

Models whi ies. Ib may De V7 tions similar to 
ich ca " se strategies Е е assumptlo а 

the Lice eed. emer i theory which starts W i de БА УБ variables 

T a quant оез on ^ 

е rational. : ory and then 8008 ^^.» (atjons about what con 

Such as E мале coi Mox e eg. cultural biases, expectat! 
inted up in this А-ы 


186 . PSYCHOMETRIKA 


stitutes successful solution, patterns of success (reinforcement schedule), and Ege e 
inference. In regard to the use of models the authors state the general conclusion t * 
“a... we are not prepared to develop or utilize as yet any formal or mathematical model 
to predict the effect of anticipated consequences on categorizing judgments. We have 
‘chosen to be satisfied with less precise prediction and to concern ourselves with the psycho- 
logical questions which must eventually underlie any model" (p. 77). In contrast to this ds 
the current attitude of many psychologists that the use of formal models has heuristic 
value and can contribute to the identification and description of the “psychological ques- 
tions" with which they are concerned. ` . 
2. In considering categorization with probabilistic cues, i.e., situations in which 
“certainty of inference from defining attributes to categorical identity cannot be achieved, 
the authors mention the similarities of their experimental situations to the well-known 
Humphreys’ “guessing” situation. (They point out, however that the increased stimulus 
complexity of their experiments provide more opportunity to observe the kind and number 
of cues to which the subject is attending.) The event-matching behavior observed by 
Humphreys and in subsequent experiments by Grant, Estes, and others is explained by the 
_authors primarily in terms of the hope of a unique solution, the need for a direct test of 
the hypothesis, and the tendency of subjects to regard it as more skillful to predict the least 
frequent alternatives. The “matching law," as it has been called by Estes, has been a focus of 
statistical learning theory in describing such verbal learning situations. It seems then 
that this kind of quantitative theory should be scrutinized for its applications to the 
investigation of the behavior studied in this book. ME . 
3. Much is made of the consequences of categorizing in terms of value or utility. 
-The authors suggest that the scaling of such judgments might be a profitable enterprise in 
~ the analysis of categorizing behavior. 
‘ 4, The authors state that “+++ it is our feeling that the unit of analysis now called 
‘the ‘response’ will have to be broadened considerably to encompass the long, contingent 
sequence of acts that, more properly speaking, can only be called a ‘performance’ " (pp. 55f.)- 


In this regard it would be interesting to consider analyses similar to those used to study the 
informational characteristics of sequential di 


some formal analyses have been made of tl 
problem solving with electronic equipment, 
5. The book provokes thought on thi 
“and problem-solving approaches. This offers 
constructor in connection with problem-solv 
in the investigation of relationships between 
in solving routine and novel problems. 
Finally, it should be said that the book is stimulating reading. It contains extremely 
thoughtful analyses of behavior usually called problem solving. Many variables which 
appear to be highly relevant for establishing the necessary functional relationships for the 
scientific description of this behavior are richly described. On the other hand, the book 1# 
somewhat wordy and stretched out in parts. The appendix by Roger W. Brown discusses 
language as a system of categories and is only somewhat, related to the mainstream of the 


book. One wonders whether the work reported should not have been published in monograph 
form rather than as a book. ` 


he behavioral sequences in “trouble-shooting” 
This work seems relevant here. 

e relationships between personality variables 
interesting measurement problems to the test 
ing aptitude. The writer is currently engage 
various measures of “rigidity” and proficiency 


ROBERT GLASER 
University of Pittsburgh 


American Institute for Research » 


ependencies in language. In another context, 


PSYCHOMETRIKA—VOL. 23, NO. 3 
SEPTEMBER, 1958 


THE VARIMAX CRITERION FOR ANALYTIC ROTATION IN 
» FACTOR ANALYSIS* Ы : 


1 Henry F. KAISER 
UNIVERSITY OF ILLINOIS 


riterion for rotation is defined. The scientific advantage 
over subjective (graphical) rotational procedures is dis- 
erion and the quartimax criterion are briefly reviewed; 
ined in detail and contrasted both logically and si 


numerically with the quartimax criterion. It is shown that the normal varimax 
solution probably coincides closely to the application of the principle of simple ы 
structure, However, it is proposed that the ultimate criterion of a rotational 
procedure is factorial invariance, not simple structure—althoügh the two 
notions appear to be highly related. The normal varimax criterion is shown 

to bea vine е е, generalization of the classic Spearman case, 1e., it 
shows perfect factorial invariance for two pure clusters. An example is 
given of the invariance of a normal varimax solution for more than two 
factors. The oblique normal varimax criterion is stated. A computational out- 
line for the orthogonal normal varimax is appended. 


An analytic er 
of analytic criteria 
cussed. Carroll’s crit 
the varimax criterion is outl 


an analytic criterion for rotation is defined as one 
onditions beyond the fundamental factor `, 


theorem, such that a factor matrix is uniquely determined. Historically, 
the first such criterion was Thurstone’s treatment of the principal axes 
Problem [10]: from any arbitrary factor matrix he suggested rotating under 
the criterion that each factor successively accounts for the maximum variance. 
But principal axes have seldom been accepted as psychologically very interest- Р 
ing ([9], р. 139). The rotation problem for psychologically meaningful factors | 
18 usually handled judgmentally. Scientifically, however, this procedure is^ 
hot very satisfactory: the ad hoc quality of subjective rotation makes . 
Uniquely determined factors impossible; only factors that are subject to the ~ 


uncertainties and controversies besetting any a posteriori reasoning can be 
for rotation would allow factor 


defined, T ipee 
. In contrast, an analytic criterion | Tu 
Analysis to Белое 5 straightforward methodology stripped of its sub- 
Jectivity and a proper tool for scientifie inquiry. 


In factor analysis, 
that imposes mathematical с 


The Quartimax Criterion — . 
etermining psychologically interpretable 
11 Ш. In an attempt to provide а 


2 thesis. I am indebted 
*Part of the material in thi is fri ter's Ph.D. t 1 

aterial in this paper 18 hD. Garter, chairman, for 

Tyler, R. 0. UA са 6, the name varimaz, 

Ji orite Computer Center for 

IBM 701 electronic 

from the National 


many helpful su; 0 2d oe 
riticisms. 
and wrote the orginal IBM. Q02A computer рговгата for C omis 
he] I am also indebted to the staff of the University O r for their 
со D in pro ramming the procedures described in the paper г a grant 
Smputer, Since their installation is partially support nowledged. 
tence Foundation, the assistance of this agency is ackno 
187 


^ 


188 PSYCHOMETRIKA 


mathematical explication of Thurstone's simple structure, he suggested that 
for a given factor matrix, 


qa) f= У Xia 
a<t 1 
should be a minimum, where j = 1, 2, --- , n are tests, s, 1 = 1, 2, 07757 


are factors, and a;, is the factor loading of the jth test on the sth factor. 
It appears that Carroll was motivated in writing (1) primarily by a close 
inspection of Thurstone's five formal rules for simple structure ([12], p. 335), 
particularly the requirement that a large loading for one faetor be opposite 
a small loading for another factor. 

In his original paper, Carroll provided two numerieal examples of the 
applieation of his method. Without the restrietion of orthogonality, these 
illustrations gave somewhat equivocal results—while the application of (1) 
appears to bring one close to the desired simple structure, the criterion has 
an obvious bias in being too strongly influenced by factorially complex tests. 

In the light of later developments, Carroll’s criterion should probably 
be relegated to the limbo of “near misses”; however, this does not detract 
from the fact that it was the first attempt to break away from an inflexible 
devotion to Thurstone’s ambiguous, arbitrary, and mathematically un- 
manageable qualitative rules for his intuitively compelling notion of simple 
structure. 

Almost simultaneously with Carroll’s development, Neuhaus and 
Wrigley [7], Saunders [8], and Ferguson [2] proposed what is usually called 
the quartimax method for orthogonal simple structure. Neuhaus and Wrigley 
suggest that a most easily interpretable factor matrix, in the simple structure 


sense, may be found when the variance of all nr squared loadings of the 
factor matrix is a maximum, i.e., 


(2) qı = [nr > У (aj) — (22 У ai)'|/w^r = maximum. 


Saunders’ approach requires that the kurtosis (fourth moment over second 
moment squared) of all loadings and their reflections be а maximum, 


(3) йе E» i GOMOD > ay = maximum. 
While Ferguson, basing his rationale on certain parallels with information 
theory, calls simply for 
(4) Ф = У 2, а, = maximum. 

АП these investigators are concerned with attaining a factor matrix 
with a maximum tendency to have both small and large loadings. While 


less obviously related to Thurstone’s rules than Carroll's criterion, the 
emphasis on small loadings coincides with Thurstone’s requirements ° 


авна * 
Мы ы Eee 
ал ӘҢ 
B. qum———— P, 

——————— HO: 

Dibe qe 

p ЦЕРЕРА 


HENRY F. KAISER 189 


zero loadings. For ortl 
Я .hogonal factors, criteria (2 
лей g › hog s, criteria (2), (3), and (4) ar i 
seed the invariance of the sum of the М аа a 
Lo te m, as well as other constants, disappear when differentiated o the 
pus pr v involved in finding the required critical point.) j ° 
" Pas "e it turns out that they are also equivalent to Carroll’s criterion 
orthogonal case. Minimizing (1) is equivalent to maximizing (4) 


sin 8 i 
ce the squared communality of a test is 
= 242 
constant = ( > a2) = Уа. + 203.05. 
г sect 


and the s R 
the sum of squared communalities over all tests is 


constant = Dale + 22 Daal. 


i act 


= qt. 
Thus, since the quartimax criteri > = i 
stant, ct ke py ET plus twice Carroll's criterion is a con- 
Neuhs 9 9d alent to minimizing f- 
iptitally а en жи realized that none of these criteria ean be real- 
cmt ph : d з ithout the aid of an electronic computer—the calculations 
анне p e extensive for a desk calculator or punched card mechanical 
Illiac* si i Jonsequently, they programmed the quartimax method for the 
propri а provided a rather extensive numerical investigation of the empirical 
rs of the quartimax method. 
du NAM were perhaps more encouraging than Carroll's. Under 
method) DN of orthogonality Carroll s criterion (or the equivalent quartimax 
when the ies 368 show nearly so obvious a bias as does Carroll's criterion 
of ortho restriction of orthogonality 15 ¥ owever, as an explication 
ae копа] simple structure, the quartimax method does have a systematic 
s which will be more fully examined in the next section. 


is removed. H 


erion 

asider all nr loadings simul- 
be applied separately 
for the final criterion 
mple, Neuhaus and 
] composition 


The Varimax Crit 
above methods cor 
these criteria may 
immed over rows 
nalities. For exa 
of the factoria 
ngs for this test, 


ee the outset, the 
to ms y In every ense, however, 
iiie of the factor matrix and st 
Wrigle of the invarianee of the commu à 
af 4h y could have defined the simplicity, SAY: O 

e jth test as the variance of the squared loadir 


(5 242 2 2 
i gar Dad — (D И. 

“Тһе Iliac i Jniversi — i ter. Subsequently, the quar ti- 

; дар Шоны ш Tlinois electron орд (Neuhaus) and the IBM 701 

о sections has been programm ed 

the IBM 


ах criteri 
(Kajg terion has been programme 4 
for Sw The Varina editerion described in the next two 8 ; 
АС ax criterion lescri n 5 llis Dickman), and 
650 (Vandenberg A (Comrey), the IBM 701 (Kaiser), Ilise ( 


190 PSYCHOMETRIKA 


To obtain the total criterion for the entire factor matrix, (5) could then be 
summed over all tests to give 


©) о = EPE YF (Lar. 


Maximizing q* is equivalent to maximizing qa , again because constant terms 
vanish when differentiated. 

Equation (6) perhaps provides some insight into the quartimax 
criterion—its aim is to simplify the description of each row, or test, of the 
factor matrix. It is unconcerned with simplifying the columns, or factors, of 
the factor matrix (probably the most fundamental of all requirements for 
simple structure), The implication of this is that the quartimax criterion 
will often give a general factor. Under requirement (5) there is no reason 
why a large loading for each test may not occur on the same factor. In practice, 
this tendency for the quartimax criterion to yield a general factor is most 
pronounced when the unrotated factor matrix has a pronounced general 
factor. 

From the simple structure viewpoint, an immediate modification of the 


quartimax criterion is apparent. Let us define the simplicity of a factor as 
the variance of its squared loadings, 


(7) v= [а Oh)? E а). 


And for the criterion for all factors, define the maximum simplicity of a 
factor matrix as the maximization of 


(8) = Dt = DY (e a = (SD ay), 


the variance of squared loadings by columns rather than by rows. 

Since a factor is a vector of correlation coefficients, the most interpret” 
able factor is one based upon correlation coefficients which are maximally 
interpretable. Those correlations which satisfy this condition are patently 
obvious: correlations of + 1, which indicate a functional relationship, and 
correlations of zero, which indicate no linear relationship. On the other hand, 
middle-sized correlations are the most difficult to understand, Thus, it is 
seen why v* in (7) could be maximized for the maximum interpretability OT 
simplicity of a factor, and more generally, why the interpretability of an 
entire factor matrix could be considered best when (8) is à maximum. 

Criterion (8) is the original raw varimax criterion [4]. In the original 
proposal of this criterion, it was shown to be mathematically equivalent, 
in the orthogonal case, to minimizing 


(9) &* = 3 {r У GU = 9» aj.) o» в )]/n^), 


TM minimizing the covariance of pairs of columns of squared loadings and 


—— B ` 
e ص‎ 
— —— ee 2 
ы ————— 

== нь: 

——— A o —————— 5 


HENRY F. KAISER ibi 


— er all possible pairs of columns for the criterion. Criterion (9) 
eal the analogous relationship to Carroll’s criterion (1) that the 
т to criterion (8) does to the quartimax criterion (6). 
À ane distinctions between quartimax and varimax orthogonal solutions 
ко, se illustrated numerically. In Table 1 solutions for Thurstone’s eleven- 
variable box problem ([12], pp. 373-375) are given. It will be noted that the 
quartimax solution [7] could hardly be called a simple structure. There is 
a large general factor, and the second factor seems only vaguely concerned 


Table 1 


Tüurstone!s ll-Variable Lox Problen? 


Raw Varimax 


Subjective Quartimax 
Test X Y 2 x Y 2 y v. 
* о 05 00 68 65 0 93 19 16 
oh 88 OL 83 47 00 o5 93 25 
E оз 0 79 j2 -08 79 n 17 88 
* во 63 -% وو‎ ш ا0‎ éL 7 2 
z о 9 5 n -0 56 o 6 75 
хЁ во ox -0 о ш 9 Bh n 2 
x? x 7 2 g a8 03 у 86 28 
2x + 2y 5з n -» 100 00 -07 sh 82 18 
(х2 + у2)} sg n -8 وو‎ -01 -07 53 a 18 
(х2 + ,2)* s2 -07 6 so 38 6 fo o» 17 
xyz 2 з 35 а oL 25 в 55 65 


Decimal points omitted. 
arimax solution 


ther hand, the raw V. 
en the restric- 


i dimensions of boxes: On the oe! 7 Le a 
ü sely parallels ‘Thurstone’s original subjective solution, 81V 
10n of orthogonality. 

test In Table 2 are solutions for Holzinger and qo 
See = ([3], рр. 229-233). Both the quartimax [7] am : 
tj m to duplicate the subjectively rotated simple stru А 
the respective variance contributions of the factors are perha 


an's 24 psychological 
aw varimax methods 
cture patterns. But 
ps more interest- 
mg. It i А : е: Уу dj. ТОТ the subjective solution 
18 less ен dei Г. the two analytic methods. In other 
Words, Holzinger Meer e have made the factors а little more level or 


192 PSYCHOMETRIKA 


Table 2 


Holzinger and Harman's Twenty-four Psychological Tests? 


“Decimal points omitted, 


even in their contribution to variance than the analytie criteria. Of the 
two analytie criteria, the raw varimax solution has given a solution which is 
closer in this respect to Holzinger and Harman's. It is also noteworthy that 
as a result of these differences the large loadings of the factors with the larger 
variance contributions for the analytic methods are larger than the large 
loadings for the smaller faetors, and similarly, the small loadings for the 
larger faetors are larger than the small loadings for the smaller factors. 
Holzinger and Harman's subjective solution does not show this systematic 
bias; their solution gives a more equitable patterning of factor loadings. 

How this bias may be removed is indicated in the next section. This 
leads to a revision of the varimax criterion, which appears to have more 
important characteristics than merely satisfying the rules of simple structure- 

Factorial Invariance: Normal Varimax 

It seems reasonable to attribute the systematic bias seen in both the 

quartimax and varimax solutions of the Holzinger-Harman data and other 


examples [4] to the divergent weights which implicitly are attached to the 
tests by their communalities. When one deals with fourth-power functions 


سے 
Subjective Quartimax Raw Varimax Normal Varimax‏ 
Test A B C D à & б B A B C D a в 0 D‏ 
зт 19 @ о 2 20 65 13 11 67 17‏ 20 62 32 10 " 
от 15 là D 2L от 38 Ob 17 07 12 08 10 43 20‏ 2 
о 18 о 22 02 52 06 15 gh 08‏ 31 13 53 12 10 3 
от L6 -0 27 08 52 Ql 20 5L 07‏ 36 12 53 18 15 1 
à ш -02 -G 78 21 12 06 E gp om‏ 15 26 15 75 5 
ё 72 05 28 25 81 03 00 06 78 10 13 1h 75 23 a‏ 
à œ 27 11 85 07 -ol -10 8h 25 10 00 82 Bi 100‏ 7 
Sh 26 38 ш 6 20 20 -oL 6 2 X 05 sh 38 22‏ 8 
-OL 29 30 86 -06 -02 10 БЛ 01 12 19 60 22 25‏ 76 9 
Ш 22 10-2 n 17 71 -B 17 15 -56 9»‏ 19- 66 28 10 
a 27 61 -oL 29 31 6 01 25 22 63 06 29 1? 08 36‏ 
-à 06 70 23 О 02 235 g‏ 19 69 16 03 09 72 13 12 
2L 63 31 02 35 57 32 -08 2L 59 39 -0l 18 ш o‏ 13 
lh 23 19 -02 L8 32 19 -0 12 26 20 Ol 16 22 oL 20‏ 
ш 08 50 25 11 10 bs 17 n 13 18 12 ш 0‏ 1 15 
3h i5 29 13 37 37 пзш ы 08 a i‏ 22 05 16 
2h -03 62 28 2h 02 57 20 23 05 ё i 05 б‏ 15 17 
іт 00 3 32 51 00 3 3‏ 30 32 22 52 20 39 01 18 
2L А‏ 13 36 22 18 19 32 19 18 28 39 18 22 12 19 
Ш L2 i2 13 21 35 Ш 2‏ 35 09 52 29 16 18 3 20 
2h 35 38 35 ш 23 ш 0 20 15 he 56‏ 33 16 17 2 
lo lo 53 ош 30 26 Ш 06 37 22 3 ш 2‏ 12 31 22 
sh 25 55 19 Ш 09 L3 21 52 16 5 2 3‏ 29 3 23 
ET 39 16 ш X ig 13 10 20 Lo L6 18 27 En 2 2‏ 
Za 30 292 268 n6 559 c2 196 I2 ал 260 26b 186 350 г 308 236‏ 


— ےل ل 
= 
es‏ 


HENRY F. KAISER — 


of factor loadings, a test with c i 
influence the Se e ы : з ee ee 
0.3. Thus, while the most obvious weights H aus рр н E 
х S E hts have been applie j sts 
"asm Med a roots of their communalities, after clem eodeni 
vesica I ки y a better set of weights—weights which would tend to 
0 оа o a greater extent the relative influence of each test during rotation 
NIU. 4e—— e 
any form of correlational cm For е we e و‎ m 
аага al analysis. For the purposes of rotation, weight 
‘oat ан у, in the sense that the lengths of the common parts of the 
гии 5 уе equal length. (The author is indebted to Dr. D. R. Saunders 
s suggestion.) The varimax criterion eould then be rewritten as 
(10) v= hn 2 (аз, АЗ)? — ГУ) (аз. ИА), 
E i 
the variance of the squared correlati h test. In contrast to (7) and (8), where 
PE D he square correlations of the tests with a factor 1s maximized, 
(the bros е 0 the squared correlations of the common parts of the tests 
neris елан. of the tests onto the common-factor space) with a factor is 
у-и maximized. [Note from (10) that we are not advocating a permanent 
heat: ing of the tests by a weight inversely as the square root of their com- 
nalities, During rotation this weighting extends the common part of 


ew test vector to unit length, but after rotation each of these vectors 18 
Shortened to its proper length by reweighting directly as the square root of 


the test's communality.] 
(th As will be seen in Table 
куш varimax, since 
raw in tests) has eff ectively 
shown ера solution of Holzinger 
Seem Аы a number of other examples | 
ortho : deviate systematically from W 
сосе structure.” er 
Proe hus far, however, merely 2 numerical-intuitive Das! | 
" edure which leads to “prettier” results has been prov ided. Such а basis 
Ph ps unsatisfactory theoretically. Indeed, this аш мн n 
for Conceivably lead to a different set of judgmentally de et wee Ws s 
the ks particular example— situation as scientifically repre ensible as 
pousse graphical methods. — 
Normal va oe a more: fundamental n 
arimaz eriterion (10) as. mat 


Where ^ is the communality of the jt 


dification the varimax criterion 
t to normalized common 
t disturbing bias in the 
Tt also has been 
arimax does not 
lered the best 


2, under this mot 
rotation is with respec 
removed the small bu 
and Harman’s example. 
[6] that the normal v 
hat may be consid 


asis for а weighting 


attempting to establish the 


ationale for t 
] definition for the rotation 


hematica 


tly reached the = 
Ve o interitem 


ame conclusion in an exten- 


Sive д rofessor Andrew Comrey has арра elation matrices of the 

Dplicati y 083 "Pt Criteri m correlati atrices of 

MMp ie оа ho normal varimax А ample, available from the writer, is the 
à ommunication). Ses classic РМА study[11] (dittoe Ji 


norm, 
al vari 
Varimax solution of Thurst 


194 PSYCHOMETRIKA 


problem. Consider the situation illustrated in Fig. 1. There are two clusters 
of tests, each of which is pure in the sense that the reflections of the test 
vectors of the cluster onto the two-dimensional, common-factor space are 
collinear. (While these clusters are drawn less than 90° apart, the following 
argument is perfectly general ) 


п 


B 
dr 


: 5 Figure 1 
Case for which а normal varimax solution is invariant under changes in the composition 
of the test battery. 


It is shown below that the angle of rotation in a plane which maximizes 


(10) is 
2[n È uv; = È u Èo] 


11) ф = 1 arctan i i i 
( à n уз (и; — э) — (25) — (Уу) 1” 
where 
u; = (an/hj)* — (ais/hj)*, 
and 


v; = 2(aa/h;)(a;2/h)). 


Let na (na > 1) be the number of tests in the first cluster and n; (ns Z 1) 
be the number of tests in the second cluster (n = m4 + ng). It is readily 
apparent that all tests of the first cluster have the same values for u; and 2; - 


HENRY F. KAISER 195 


Let these v S 1 e values for the second cluster 
S alues be va and v4 Simila 

: ad . arly let tl "d for tl t 

be ug and zg . In this case (11) reduces to | E = = 


(12) ¢ = 1 arctan 2n nau ata + Upp — Uda — UD 0 
Rags + us — vx — 08 — Duala + wavs) ` 


A most i et: PEREAT m 
Mcr EFE ee 
number of tests in each cluster, "Pe x joy dee а depend on the 
йалт зуя шы uster, i.e., for the case illustrated in Fig. 1, the normal 
pola tera E "s aring under changes in the composition of the test battery. 
tig е ү s property would seem to be of greater significance than 
ematically m Kevin of the normal varimax solution to define math- 
in pue m М octrine of simple structure. Although factor analysis seems 
ы in purposes, fundamentally it is addressed to the following 
internal “зш tines (infinite) domain of psychological content, infer the 
from dm E uc iure of this domain on the basis ofa sample of n tests drawn 
genios omain. The possibility of success in such inferences is obviously 
UP ample ря the extent which a factor derived from a particular battery 
the infinit 2 tests approximates the corresponding unobservable factor in 
is, сан, domain. Ir а factor is invariant under changing samples of tests, 
s factorial invariance ([12], рр. 360-361), there 15 evidence that 


ferences regarding domain factors are correct. 
n varimax solution, according to the abov їз suc 
Titis М d. egardless of the sampling of tests, for the problem shown in Fig. 
not 5 sible to infer precisely the domain normal varimax factors, This is 

or either the quartimax or raw varimax solutions since the angle of 


titatea s 
P" is a function of na and ^es - 
meaningf Por domain normal varimax 
Situs o. im domain factor according ns 
of portray} iat observed normal varimax factors Wi 
aying the corresponding domain factors. i 
ition gh one often gets the impression that ampi PAR? 
ultima Me criterion of a rotational procedure, it is suggest к и : ee 
origina € criterion is factorial invariance. The normal varimax Wes ks 
crit Цу devised solely for the purpose of satisfying the simple 8 pc u 
eria. But the fact that it show st of invariance 


Sugg 
Е еѕіѕ м 1 
that Thurstone's reasoning W° 


e result, allows such 


are not said to be more 
riterion; it is 
ter likelihood 


factors 
to some different c 


]l have a grea 


le structure is the 


T 

Ariane 

nas. e. Th ees А structure may | { 

Incidental e principle of simple 8 f factorial invariance. This 

i al to the more fundamental concept of 809 psychological 
arguments concerning “Psych ogica 

i ге factors, ete. 


Viewpoi 
realitya д renders meaningless the 27587: 2 
Ad of general factors, bipolar factors, simple structu 
le woe the result (12) iS 
Spegy les within each of the Pr 
an matrix, and the reduced € 


196 PSYCHOMETRIKA 


of rank two. Normal varimax does however give invariant results for two 
such Spearman clusters simultaneously, and eonsequently the normal varimax 
criterion is a two-dimensional generalization of the classic Spearman case. 


Obviously, the next step would be a generalization along the same lines to 


the r-dimensional ease; thus far, however, work on this problem suggests 
virtually insuperable mathematical difficulties. 

To investigate numerically the tendency of the criterion to give factorially 
invariant solutions for r larger than two, again consider Holzinger and 
Harman's empirical data. Taking their centroid loadings, the first five tests 
were rotated then the first six, ete., systematically until the analysis of 
all 24 tests was reached, as in Table 2. The results of this application of the 
normal varimax criterion are given in Table 3. There, the normal varimax 
loadings for the four factors as a function of the changing number of tests 
are given. 

Note that faetors 4 and C are essentially invariant from the outset; 
the loading changes, while somewhat systematic, are negligible—24 appears 
to be essentially infinite. On the other hand, factors B and D show more 
movement before they become stable. The reason for this is readily apparent. 
For both B and D, this movement occurs only while they are underdetermined, 
ie. only while they contain no appreciable loadings. However, once they 
pick up a test or two with high loadings, their ambiguous definition abruptly 
stops, and they settle down to exhibit a degree of invariance similar to the 


Table 3A 


Normal, Varimax Loading Changes for Holzinger and Harman's Factor A (а = $5 6) sss 2h). 


п 
Test s 6 7 8 9 10 M 12 13 Ш 15 16 17 18 19 20 a 22 2 
ү 19 19 18 19 18 16 16 16 16 16 15 16 15 16 16 15 1; 15 Шш 
2 i2 32? 12 13 12 12 12 12 12 à à n à o; ux uu 0 
3 15 16 15 16 16 17 17 17 17 16 16 16 16 17 17 16 18 15 1 
Г az mn ao 2 2 21 2 annu а 2 а а a 2 2 20 20 
P 79 18 18 79 18 16 75 16 16 16 7 75 75 176 16 15 75 15 15 
@ m "n m 18 77 16 m m 16 75 175 15 15 15 1 1: 15 1 
1 83 Bh 83 82 82 вә вә вә в 82 82 603 82 62 32 92 C? 
И 89 58 5 5 SS ss 55 5 55 55 55 55 gs 55 55 2 
82 83 82 83 83 & ĉo бо 80 60 бо 50 60 8 
3 18 15 17 17 25 16 16 15 16 16 16 25 25 18 
10 19 21 21 19 18 18 18 18 18 18 13 18 18 
n оз o оз оз оз оз o o o оз 9 9 
as 16 19 19 20 20 20 2 19 15 29 2 
11 23 21 22 21 22 22 22 22 22 12 
T n 12 n 12 12 12 12 12 о 
09 09 10 09 o9 oF 0? 
16 15 1 ih ii d ш 2 
17 à о оо o) 00 © 
18 1з 13 1; BB 
19 35 35 X 25 
20 15 16 16 
2 36 26 
22 35 
23 


— 
—— ——— 
——————— 
“ее —— 
.— .. weer 


HENRY F. KAISER 


Table 33 


Normal Varimax Loading Changes for Holzinger and Harman's Factor B (n = 5, 6, sees ay 


197 


Tes 
ы $ b F E grun 32 3g Ш as 36 17 18 i9 20 21 2 23 
1 10 20 
2 i0 10 10 0; 12 16 19 2 2 20 19 19 3 16 19 19 19 1 
3 E. à о o 02 оз o5 от 08 O8 O7 OF OF o; 06 07 07 07 d 
i 2 -07 -07 -06 -Oh -ok -01 01 02 03 0? 02 02 02 02 02 02 02 02 
5 365 -03 -02 -01 - 02 05 07 o9 09 09 09 09 08 08 08 09 09 08 
5 i 03 05 Ol -00 19 22 21 22 2 21 21 20 20 20 20 20 21 20 
7 -03 -02 -05 оз 10 1h 12 B M 10 10 û9 08 08 08 09 09 09 
8 203 -03 -08 12 16 15 16 16 16 16 15 15 15 15 15 16 16 
9 10 02 21 2h 25 26 26 26 26 5 25 25 25 25 25 25 
10 оз 03 08 05 0 03 02 Ol 00 -00 -01 -01 -00 00 -00 
n з 1 т ът п т 0 69 69 69 69 70 70 
12 68 67 6: & 62 6 6 60 59 59 60 6 60 
13 @ & 6 ө 69 69 69 6 69 69 69 6 
1l 57 59 59 59 59 59 59 59 59 59 59 
15 2 19 19 16 15 15 Ш 15 is 15 
16 10 10 07 06 06 o6 06 06 06 
17 i2 10 09 08 08 09 09 09 
18 19 17 17 iz xor rx 
19 26 25 25 26 26 26 
3 ih ih 24 15 15 
21 o9 10 10 10 
$5 т 3 3 
25 Oh Ol 
20 


Bags 
Decimal points omitted. 


Table 30 
= 
и E T їл ana amant Factor C (n ” 5 EE) 

п 
it 5 6 т 8 9 1 à ور‎ 13 Ш 15 16 17 18 i9 20 21 22 
5 КЖЕ 
EE ao eR ER ES BERE 
з Ce Щщ Бшш b E Шш m X À x а а 
Г : па à n 5» d u su Sk 5 
|o» hh hOB а аа 2o» à $ i 2b 20 20 
5 23 2 55 55 55 55 оя а 2 a 0 юю v n OEC. 
i CTT ETE @ 2n 25 16 18 19 19 b 
( CRERRARREYT EE SS ees 
P SRR RRL ERS aaa as 
: g oj X 7 7 07 07 07 
ob ol -03 06 06 o о 
р Шило д 5 2 22 
13 25 r К g № и ы Ш Bod 
i 08 05 o, 05 93 02 03 3 A 
i 15 ib 15 13 12 13 13 ü 
15 » $ ы юш x E 
ni ш ов Oh o5 06 © 
18 $ on nm 
22 53 3 23 
E goa d 
a ш جا‎ 
22 


198 PSYCHOMETRIKA 


Table 3D 
Normal Varimax Loading Changes for Holzinger and sarman's Factor 2 (а = 5, Gy sees 75i 
n 
Test $ b Y & s» dà d de 39 а dp X36 if dB 30 x0 ch RR € 2L 
1 о -0 оо 0 -07 OL 17 17 
2 o о 02 o 00 02 10 16 
3 -00 01 01 -00 05 -00 08 07 
m -03 -03 -03 -oL -0l -Ok n M 
5 -00 -06 -03 -02 -09 -06 li i 
a 06 C9 09 05 05 NES 
1 -06 -06 -06 -10 o 9% 
8 -0L -h -07 17 25 
9 15 11 ?5 2, 
10 -00 д 36 
п 37 1 
12 n о 
13 or 06 
1h £o 9 
15 50 50 
16 ш a 
17 [2n 1 
18 che 5 
19 » 27 
a X* 
% % 36 
23 22 22 
2l 3h 


"Decimal points omitted. 


other two factors, which had high loadings from the beginning. For n = 25 


there appear to be good approximations to the domain normal varimax 
factors. 


The Oblique Case 

If the restriction of orthogonality is relaxed, it is impossible to apply 
directly the quartimax criterion (4) or the normal varimax criterion (10)- 
This is because interfactor relationships are not considered when the criteria 
are in this form, and when applied all factors will collapse into the same 
factor—that one factor which best meets the criterion. However, Carroll’s 
version of the quartimax criterion explicitly considers ӨИДӨ? relation- 
ships and an oblique solution is attainable. As suggested by (9), if 


аз) e = De Un 22 (а, (ай) — (7 ai, АСУ n), 


it may be shown that in the orthogonal ease v = — 20, This alternative for?! 
of the normal varimax may then be used to obtain oblique factors. The 
mathematical problem of minimizing (13) is exactly analogous to Kaiser's 
[5] treatment for Carroll's criterion. Computationally, the (iterative) solution 
involves finding the latent vector associated with the smallest latent root 0 
a constantly changing symmetric matrix of order r. 


HENRY F. KAISER 199 


Computalional Appendix 
To compute an orthogonal normal varimax solution, the following 
procedure is suggested. The first step is to normalize the rows of the arbitrary 
reference factor matrix (e.g, principal axes or centroids) by dividing each 
element by A; . Rotation to the direction of the normal varimax factors may 
then be earried out with respect to these normalized loadings. 
The criterion (10) will be applied to two factors at a time. For this purpose, 


the following notation for an orthogona 


ti A e Bes Xi Yi 
zr. Y| Lsin Ф cos ¢ X. Жа 


] rotation is convenient. 


$ 

& Lx. ds Xn Y. 
where x; and y;, the present normal 
and Y, , the desired normalized load 


dings, are constants, and X; 


ized loa 
the angle of 


ings, are functions of Ф, 


rotation. 

It is immediately seen that 
(14) X, = 2, cose + 01510 $, 
(15) y; = —a;sin é + y; 008 $. 
Thus, 
(16) dX,/de = Yi» 
(17) дуа = -Xi 


According to (10), in this plane, 

О nan DAT- О 

re be a maximum. Differentiating 

7), and setting the derivative equal to Zero, , 

72 AL 

du a yx — ¥ = ХХ ZF = ү) =0. нен 

T mé i the va C; ani 

Y To solve (19) for ф in terms of a; and Yi» substitute кн e ind 
i from (14) and (15), consult à table of trigonometric identities, апа, 

А 


а : 
good deal of algebraic manipulation, 


may +n У Cg T eS y» 
(18) with respect to Ф, using (16) and 


(Q0 $= t arctan ‚ _ Cal 
ln b» (2° ®: 2n = p» (a = ч ze [3 Qul 

Ir Xe — yy - Qu - ir ct (11) above. 
Ч = a? — y? and v, = 22001, CO reduces to the nition for a maximum 


Ry Ed course, (11) or (20) is only 
aking the second derivative of (18 


200 PSYCHOMETRIKA 


may be found. These are summarized below. 


sign of numerator 


+ — 
0? to 0? to 
+| оо —224° 
sign of м 
denominator 


The sign of numerator and denominator refer to the right-hand member 
of (20); the values in the cells refer to ¢. 

"These single-plane rotations are made on factors 1 with 2, 1 with 3, + ,1 
with r, 2 with 3, --- , 2 with r, °°° , (r — 1) with r, 1 with 2, --+ iteratively 
until r(r — 1)/2 successive rotations of ¢ = 0 are obtained, i.e., until the 
process converges. (It was shown [6] that » in (10) cannot be greater than 
(r — 1)/r, and since each successive application of (20) can result only in 
a non-decrease of v, this iterative procedure must converge.) After con- 
vergence, each normalized test vector is restored to its proper length by 
multiplying by h; . 

Since this article was accepted for publieation, the author has prepared 
a detailed outline for coding an electronic computer program for the varimax 
criterion. This (dittoed) paper is available from the writer. 


T 


REFERENCES 

1] Carroll, J. B. An analytical solution for approximating simple structure in factor 
analysis. Psychometrika, 1953, 18, 23-38. 

2] Ferguson, G. A. The concept of parsimony in factor analysis. Psychometrika, 1954, 
19, 281-290. 

3] Holzinger, K. J. and Harman, H. H. Factor analysis. Chicago: Univ. Chicago Press, 
1941. 

4] Kaiser, Н. F. An analytic rotational criterion for factor analysis. Amer, Psychol- 
ogist, 1955, 10, 438. (Abstract) 

5] Kaiser, Н. F. Note on Carroll’s analytic simple structure. Psychometrika, 1956, 21, 
89-92. 

6] Kaiser, H. F. The varimax method of factor analysis. Unpublished doctoral disserta- 
tion, Univ. California, 1956. 

7] Neuhaus, J. O. and Wrigley, C. The quartimax method: an analytieal approach to 
orthogonal simple structure. Brit. J. statist. Psychol., 1954, 7, 81-91. 

8] Saunders, D. R. An analytic method for rotation to orthogonal simple structure. 

Princeton: Educational Testing Service Research Bulletin 53-10, 1953. 

9] Thomson, G. H. The factorial analysis of human ability. (5th ed.) New York: Houghton 

“Mifflin, 1951. | 

0] Thurstone, L. L. Theory of multiple factors. Ann Arbor: Edwards Bros., 1932. 
Г Thurstone, L. L. Primary mental abilities. Psychometric Monogr. No. 1, 1938. 
a Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 


d 11/15/57 
{ received 1/9/58 


Manuscript receive 
Revised manuscrip 


P ax нн و‎ ^ 


PSYCHOMETRIKA— VOL. 23, No, 3 
SEPTEMBER, 1958 


POWER FUNCTION CHARTS FOR SPECIFICATION Or 
SAMPLE SIZE IN ANALYSIS OF VARIANCE 


LEONARD S. FELDT 
STATE UNIVERSITY OF IOWA 
AND 
Monarram W. MAHMOUD 
EGYPTIAN MINISTRY OF EDUCATION 


The specification of sample size is an important aspect of the planning 
of every experiment, When the investigator intends to use the techniques of 
analysis of variance in the study of treatments effects, he should, in specify- 
ing sample size, take into consideration the power of the F tests which will 
be made. The charts presented in this paper make possible a simple and 
direct estimate of the sample size required for F tests of specified power. 


A primary consideration in the design of any experiment is the specifica- 

tion of the number of subjeets to be selected from the various treatment 
Populations, This number should be such that the important. statistical 
tests will be reasonably sensitive in detecting false null hypotheses. Statistical 
theory provides the basis for designing such tests: in many psychological 
and educational experiments sufficient preliminary information is available 
to permit an application of this theory. The purpose of this paper is to provide 
Power function charts which will simplify the application of the theory and 
thus facilitate the specification of sample size in experiments employing the 
techniques of analysis of variance. 
The power of the statistical test in any experimental setup—that is, 
probability of rejecting the null hypothesis when it is false—depends 
on the level of significance a at which the test is made, the number of obser- 
vations or subjects n on which data are available, and the degree of falsity ф/ 
of the hypothesis under test. The latter factor is defined as the square root 
9f the ratio of the variance of the treatment population means to the variance 
Or error within the treatment populations, Symbolically, 


gf = |2 би = w/k, 
g 


For every Ё test at a given level of significance in any given design, the 
Power P against any specified alternative to the null hypothesis is uniquely 

ctermined by the value of n. Conversely, for every test there exists a value 
of n Which will result in a test of specified power against a specified alternative. 


201 


the 


202 PSYCHOMETRIKA 


i " test can be 
In those experiments in which the power requirements of the F test can b 
n 


tionally fixed against a specific alternative, it is possible to determine E 
жасо sample size. It is for such situations that the present charts 
are intended. 


Nature of the Charts 


The charts presented in this paper are for use with tests of the m 
effects of treatments in experiments involving two to five levels of the treat- 
ment variable. The charts are strietly valid only for the completely Зао» 
ized design; however, they may be applied with relatively little error to fests 
of treatments effects in randomized block designs and factorial designs 
employing a within-cells estimate of error variance. A chart presents tw y 
families of three curves each. The families pertain to the .05 and .01 levels 
of Significance; the curves within families correspond to power values of 
-5, 7 and .9. 

A separate chart 
of the treatment vari 
family appropriate fo 
eter ф' along the absei 
per treatment for 
of the chart, 


is provided for each value of X, the number of levels 
able, f, = k — 1, from 2 through 5. The chart and 
T à given experimental test is entered with the param- 
ssa. The value of n, the number of observations required 
a test of specified power, is read directly from the ordinate 


Historical Development 


The distribution of the F Statistic 


under hypotheses alternative to the 
null hypothesis Was first 


Y Fisher [1] and Wishart [9], who 
F distribution in the form of the 
on ratio, Later Tang [8] de 


he same result from the distribution 
- Tang also Presented extensive tables of the power 
n. These tables are entered with the parameter ф, defined as 


ue [у пш — n). 


functio, 


aining a. false null 
rval of tabulation 
г Satisfactory inter- 


polation, 
Following Tang’s Procedure, Lehmer [4] tabulated the values of ¢ for 
a= 05 апа .01,P = 7 and .8 over a wide range of f, and f» . These tables 
in the power range considered: however they cannot 

n the planning of experiments, From the tables the 
st will have 4 Power less than .7, 
Pecified alternative. A greater 


les Considerably more useful- 


C= YUMA SP u heec о} TOMO 3uvjsuo;) jo saA1nj) 


э 
Ò S9' 09° SG OS' Gb Ov се” Oc "G2" Og, = I0r 7893 
SO's5e uode 02 2890909 GS OG “Sri Ob? GE OC бо OAS бү 
дез -T H а É 
CEERERLDEDRE à 
[| IN 5 oe 
Sd NX NI 
Емен иы NS 
a | 
N | | 
ب‎ 


ава E SH 


nog 
ЕЕЕ \ 


N ——— ت‎ 


—M — „е ете — ا ت‎ 
© = s wy SINT Urey Jo IL әчү 107 19904 учеүзчөгу уо SAND 
T HAITI 


ол 99° O9' SS OS’ Gv Ov Se OF Se Od GY + 10'-» ноз 
G0'-0041-—0/ S9 09 GG’ OG’ Sv Ov Se OF GZ Oc slr Or 


0 


ERE 


kgh 


а 


Y = чил syooyy WWJ jo 359], oy} 10] 1240q 1uv)suo) Jo вәллпгу 
£ HADI 


ф OL 99° 09° 66° 09° Gp' Ob Se Ое” Sz Oz G = |О°*= о ноз 
' SO'= ноз = 99° 09° GG’ OG Sv’ Ov: se" O£' 62° бё Gr ог 
T i Г 


— ee. 
b У 


= 4 YA SHIH UW JO 3591, әчү 10] зәмод iuvjsuor) jo SIAMI 
y ип 
SV - 10-59 x04 


SG" OG Gt’ Ov Ge’ Of S2 Оё 
Ge’ oz sr ог SO’ 
7 


9 SO"= воз — 09` 95° OS" Gt Ov" S€' OF" 
| | 1 EE | 
I— 01 
" oz 
— e | an ос 
E - = ات اا‎ Ot 
ы Г | P TB ne 
— үт ja ا‎ 09 
| Ё - P 
В | кс = 08 
ked ов 
су 
- iF + i 6 PAGE 6 | c [sg 001 
ВИР RET EN NU 


LEONARD S. FELDT AND MOHARRAM W. MAHMOUD 207 


Patnaik [6] made an extensive study of the power function of analysis 
of variance tests, By the method of moments, he derived an approximation 
to the noneentral F distribution based on the central F distribution. This 
approximation is computationally feasible but somewhat tedious, especially 
if a number of power estimates are required. Its primary limitation to psycho- 
logical experimenters is the labor involved in utilizing Pearson's Tables of 
the Incomplete Beta Function in obtaining power values. This limitation is 
especially marked in the many instances which demand interpolation within 
these tables. 

Pearson and Hartley [7] presented families of power curves for various 
combinations of о, f, , and f» , which make possible а direct estimate of the 
power of analysis of variance tests. These curves, like the tables of Tang, 
аге entered with the parameter $. For any given experimental setup, the 
power of the test may be read directly from the ordinate of the curve. These 
curves are well suited to the evaluation of the power of any given test. They 
cannot be easily employed in the inverse manner, however, to indicate the 
value of n which should be adopted in order to secure à test of specified 
power. For this purpose, the experimenter must adopt the relatively in- 
efficient. approach of making repeated approximations until the value of n 
has been estimated with sufficient accuracy. — | етм 

Nicholson [5] and Hodges [3] have derived general formulas or e 
computation of the power of the analysis of variance test when fz is an even 
number, The formulas involve the evaluation of terms ina erem 
the number of terms being dependent. on the number of degrees nom 
for error. The latter feature is à serious practical limitation, for when 2 

++ often is in psych cal experiments, the evaluation 
greater than 20, as it often is In psy 


hologi 
м ractical utility. 
becomes too laborious to be of practic а n some of the objections to 
Fox [2] developed charts which ا‎ sample size. These charts 
earlier works and facilitate the determination A oe ж + мол бо. 
Were constructed from the tables of Tang and af, By a method of suc- 
graphs of constant ф for varying Lenin I res ind for a fixed value of 
cessive approximations, the value of n eei E beni hypothesis. These 
д Н a specified а e = М 
а and a fixed value of P against а SP ‘ever, because of the iterative 
charts are somewhat laborious to apply, үзү do n 
nature of the approximation for n. Also, 


ot extend below 
; 1 who typica 
f, = 3. For psychological experim 


lly deal with fixed 
«bly restricts their usefulness. 
treatments effects, this limitation ент er of 5 
In theory, the problem of evaluating 


r of the test of treatments 
i letely solved. Exact, 
effects in the simpler lvses of variance has been comp y 
5 1n the simpler analyses acid Tav been deriv 


i ed. However, neither 
Approximate, and graphical зайва pter Ed 


ke possible а 

ia акар ишп] سو‎ imation of the sample size e 

Simple, dir iterative approx! ." (his paper permit ti 
ple, direct, noniterath els presented in is pa x cera 

а test of specified power. The erp considera 


in пао: es 
direct approximation for n, and henc 


enters, 


208 PSYCHOMETRIKA 


Construction of the Charts 


Each chart in this series presents, for a = 


7 .05 and .01, a family of three 
curves which correspond to P = .5, -7, and .9. The numerical calculations 


for the coordinates of the points on the curve P = т were carried out from 
the tables of Lehmer; the calculations for the remaining curves were based 
on data read from the charts of Pearson and Hartley. The three basie steps 
in the calculations were as follows, 
(1) Determine (from chart or table) p. 
Specified value of P, f, , and a. 
(2) Solve f, for n from the relationship n = 1 + 1/0, where [ is the 


number of treatments and n the number of observations per 
treatment, 


(3) Divide $ by Va to obtain ф'. 


Тһе pairs of coordinates for n and ¢' were then plotted and smooth curves 
fitted through the Points. 


airs of values for ¢ and f, for a 


mpare the level of mastery reached by 
5 who memorize 


Ару а list of paired adjectives 
under three levels of motivation, From а tentative theoretical formulation of 
following array of mean 
) iree levels: 


Against this alternative to 
power of 90 for a bad made at the 5 Per cent leve]. Previous experimentation 
with this list has given rise to an error variance of 100.0, a value which may 
be taken as a population parameter for this Purpose, d 

From the array of diffe К 


Tences the Variance 
means can be computed equal to 10,89, T 
.33. Entering Figure 2 with this ү; 


equal to 40. 


‘perimenter wishes à 


is therefore equal to 


» the required number of subjects is 


Note on the General 
The charts presented in 


: ; ali i test 
of main effects in the completely randomized design, ae 5 m of т 
read from these charts underestimate only slightly ег, values 


i ; the y; hich 
would be required in the randomized block or factorial dos ея 
onal designs. 


pP 


LEONARD 5. FELDT AND MOHARRAM W. MAHMOUD 209 


ws teuer of the charts stems from the unique relationship which 
: s b veen n and fo in each experimental design. For example, for the 
ompletely randomized design (and the one used in the con: i ti 

these charts) the relationship is ux 9 x 


n = 1 + f. 


In the randomized block design 
n = 1 + f:/(k — 1). 


х or the test of the factor with Æ levels in the k X h factorial design (mean 
square within cells being used as the error term) the relationship is 


n = h + fo/k. 


Mec eid еа е Telnbionships, the numerical 

а e design to another, and charts based 
on that which holds for the completely randomized design will be only 
approximately correct for the other setups. However, for values of f, 2 20, 
the relationship of ¢ to ¢’ is almost identical for all three designs. We may 
demonstrate the relatively small error involved in using the present charts 
for planning randomized block and factorial designs by applying the charts 
to two examples which tend to maximize the extent of the inaccuracy. This 
ACES in the randomized block design when k = 2; it occurs in the factorial 
design employing a within-cells error term and proportional frequencies 
when the number of measures per cell approaches 2. According to Figure 1, 
ing two levels of the treatment variable 
= 20) for P = .90 against ¢ = .725. 

а randomized block design for 
ble value for a 2 X 6 factorial 


à completely randomized design involv 
and a = .05 will require n = 11 (fe 

The value of n which is actually needed in 

Р = .90 is approximately 12.0. The compara 
design is 11.9. 

This discrepancy, 
Consequence in the planning 
for larger values of f, and fe 
Mating the necessary sample 
ments, the tables presented in this 


REFERENCES 
tion of the multiple correlation coeffi- 


eme case is probably of little 
is considerably smaller 
. Therefore, for practical purposes of approxi- 
size in randomized block and factorial experi- 
paper would seem sufficiently precise. 


which for even this extr 
of most experiments, 


[1] Fisher, R. A. The general sampling pow s 
cient. Pr A, 121, 654-673. 

[2] Fo M. "Charts De оь аа Fro Ann. math. Statist., 1956, 27, 481-497. 

[3] Hodges, J. L. On the noncentral Beta distribution. ‘Ann. math. Statist., 1955, 26, 
648-653. 

[4] Lehmer, E. Inverse tables of probabilities of errors of the sec 

_ Statist., 1944, 15, 388-308. 

[5] Nicholson, W. L. A computing formula for the p 
test. Ann. math. Statist., 1954, 25, 607-610. 


ond kind. Ann. math. 


ower of the analysis of variance 


210 PSYCHOMETRIKA 


[6] Patnaik, P. B. The noncentral x? 
metrika, 1949, 36, 202-232, 

[7] Pearson, E. S. and Hartley, 
lance tests, derived from the 

[8] Tang, P. C. The power func 


and F-distribution and their applications. Bio- 


H. O. Charts of the power function for analysis of Vi 

noncentral F-distribution. Biometrika, 1951, 38, 112-130. 

tion of the analysis of variance test with tables and illus- 
trations of their use. Statist. res. Mi emoirs. 1938, 2, 126-149, 

[9] Wishart, J. A note on the distribution of the correlation ratio. Biometrika, 1932, 24, 
441-450. 


Manuscript received 9/25/57 


PSYCHOMETRIKA—VOL, 23, NO. 3 
SEPTEMBER, 1958 


A COMPARATIVE STUDY OF THREE METHODS OF ROTATION 


BENJAMIN FRUCHTER 
UNIVERSITY OF TEXAS 
AND 
EDWIN Novak 
SYSTEMS DEVELOPMENT CORP. 


Three methods of rotation (the graphical, the Thurstone analytical, 
and the direct-rotational) were applied to the matrix of centroid loadings 
for 35 variables, to determine which method is the most efficient from the- 
oretical and practical standpoints. The direct-rotational method provided 
the most information for determining the rank of the configuration and was 
most economical with respect, to time required to reach a rotational solution. 
'The analytical method required the least number of judgmental decisions 
and was the most objective. The graphical method was the most laborious 
but had a slight advantage with regard to the number of near-zero loadings 
in the rotational solution. 


Since 1932, the year in which L. L. Thurstone introduced rotational 
procedures to factor analysis, many different methods of rotation have 
been developed. A survey of the literature reporting factorial studies has 
revealed that graphical methods have been employed most frequently to 
accomplish the rotations even though these methods are relatively slow, 


laborious, and require numerous skilled judgments. Until the present time, 
however, they have been the most dependable methods. The majority of 
the recent proposals for the solution of the rotational problem have been 


analytical procedures with emphasis on objectivity. i 
Thurstone [9] and Carroll [1] have developed analytical methods by 


Which solutions may be obtained on orthogonal or ааа 
18] and Neuhaus and Wrigley [7] have developed gri ie ; an ч i 
Which yield orthogonal factors. With the exception a ront dE 
Which requires judgment in selecting the initial Teen е з Ку. ше 
procedures are sufficiently systematic so that putes со ; x Де Sr 
for them can be written. Differing in арр dine be Ж V haver ooe 
advanced that are of particular theoretical interest т ү "B die 
to reduce the problem to a single rotation. Of pet a м А promise 
by Harris [5], based upon geometric models, has show? 


for future development. 

This paper reports an inv 
rotational solutions to a battery 9 
ground variables [2]. The graphical, 
direct-rotational methods were applied. 

211 


compared three oblique, 
achievement, and back- 
analytical, and the 


estigation which 
f 35 aptitude, 
the Thurstone 


212 PSYCHOMETRIKA 


The study attempted to assess the validity 
by comparing the results from the soluti 
methods. 


(1) The rank of the factor configuration is determined with een 
accuracy by the direct-rotational method than by graphical or analytica 
methods, 


of the following propositions 
ons derived by the three rotational 


(2) The solution. obtained by the direct-rotational method yields a 
result which is a closer approximation to simple structure than the results 
obtained by the graphical and analytical methods, 

(3) The solution by the direct-rotational method requires less time 
to reach an approximation to simple structure than do the graphical and 
analytical solutions, 


juires a smaller number of skilled 
than do the graphical or analytical 


Procedure 
A comprehensive t 


administered to 881 male, basic 
airmen at Lackland Air Force Base, San Antonio, Texas, The mean chrono- 
logical age of the airmen was 18 years; their mean educational attainment 
was the eleventh grade, 

Product moment Correlations w 
from the Army General Classific 
Battery, the Adjutant General Mechanical 
Aptitude Tests, the Gr: 


üy-Votaw General 
School Content Examination, the Otis 


est battery was 


еге obtained for 


35 variables selected 
ation Tests, 


the Airmen Classification Test. 
Aptitude Test, the Differential 
Achievement, Tests, the Iowa High 


; ; ; П 1 Quick-Scoring Mental Ability Test, 
the Sims Score Card for Socio-Economic Status, and education in years. 


The matrix of intercorrelations was factored by the centroid method. 


< ": оғ "ten Y Л b à iter > 
Several empirical criteria, such as Tucker’s Phi, Coombs’ Criterion, and the 
range of the centroid loadings indicated 


: erp : the rank of the correlation matrix 
to be eight. (The Intercorrelation and centroid factor Matrices have been 
published in Fruchter [2].) 


The Graphical Solution 


Hu 
column. Each plot w exa а rotation of a pair 
of columns or axes was indicated. After 

involving these axes were replotted, 


aS rotated, the graphs 
ot The new plots were Again inspected 
to determine the need for additional rotations 


S. The process was continued 
roximated simple structure 
ated an oblique configuration 


e Was plotted 


as examined to determine Whether 


against every other 


Sco 702 8/0 970 S00- T70- L90- О шлод ‘prep exoog surs ‘Gg 
eo Sco LLO 980 vL 180 790 8T [403 WW wog ‘eweg *Supioog-xornb SPIO “HE 
5 650 эте 992 960- а" 000 coz S91 1 mog *3us3uo) Tooyos u8IH emor ‘gg 

8S0- S10 950 290 950 z£0 956 160 этзәшцарлү SmeIOA-AvIDN ‘ze 
1£0- 780 SOT 180 260 611 090 zz* Surpesu *^e3oA-Kea9 ‘TE 
150- YOE 650 090 £00 9<0 040 STE 5рл©М jo әотоҷо 'av3oA-£ex9 “OE 
sso- STI EEE 20 сло 180- 800 9% 91nieQei1yTI “медод-Келә 67 
950- £00 ler 600- EZO 100 тет ozz SƏFPNIS 18—005 ‹медол-Келә т 
400 621 292 610 1£0- 911 uo 261 9»uajpog 'ae3oA-Ae19 `Z 
Szo- £40 9%0 9tc 190 S70 £00 190- V wog ‘suotqeTey eo»eds-IyQ 97 
tzo- 980 901 LET SS0- LS0- sss 800 у uioà 'Á3TlIQV qeoriounN-INQ >с 
а 620 TOE 180- T70 670 Lu O01 92% V wog “үү java 'o8esp oSenSue]-IVT ‘ht 
Fd ©90- 161 6£0 080- $90 90 E 7975 ¥ шлод “ү алва‘ әЗеѕп oSenSue]-IVQ ‘EZ 
o 641 ©00- 790- £10 OLS 010 210 160 V шлод “Тү 3iegd 'Ieorio10-IVQ ‘tt 
A 760 20- 840 £7 600 8S0- от 980 V wog ‘Sutuoseay 32e33sqy-IvQ ‘IZ 
4 ГАДА 9£1 610- oso- 290 uo L60 £80- 810938 'ao3onxisuf *лозџәлит [eojyudeigorg "oz 
e #20 290 Enz 910 SEO- 650 100- 704 ҮС09ІЯ fy шлод eBpetmouy prom ‘6T 
a 100 050- ото 150 ETI 209 S00 oot- V*906IS 'suopioung Тоор «gr 
a 070- 901 Yso Tez 987 100- 080- 170- VOI9dd “чотзтотуүзчәрү Jo рәәй5 үт 
4 000 £00- 61 660 910- £L1 Lal Soe VIO9IS 'uorsuauaiduo) 3uppeow от 
AR 2%0- %00- 0Е0- 990- Le 720- ETS 6971 920/10 ‘II 'suopaeaodo Qeorxoumg ‘GT 
Е 651 L10- 600- SLE YOT 680- 911 811 VOTSIG 'sxxeupueT 103 Áaousg сү 
H LEO 6T 690- oze 000 YLE 650 Sto V£O6IS ‘səd ouad qeoqpueuosu ‘ET 
i 950 soo- 0S0 сЕ0- €z0- 009 650 raed VZ06I8 “5отирцоәц [exous9 тт 
р 150 80 Ut *vz0- 001 ET 180 SLT VIO6IS fuor3euxoguI qQeoi112913 тү 
Ё 001 9%0- £10 sez 85е *to 962 S60 ҮТс9-Усс994 *Зотреәм atqel pue тета ‘oT 
» s00 080- 807 860 $10- £10 950 102 УФОТ1Ч 's11e33V iuoian) 103 punoiSxoeg +6 
&B 110 810- ESE £60 800 197 €00- оєт УТОТІЯ “шотзтшлоўиү uogpierAy *@ 
a 650 ЄЄ0- тє0 SLI z0- 660 єў 9ST VIOZIS ‘Sutuoseay оүәшцдүлу +7 
р Tze 821 #81 z0- 660 680 161 900 Siva UT UoljPOnpg `9 
Е 920 SLI oso Toz те 6ЕЄ 990 0%0- с epnatady тТеотиецээуд дү •с̧ 
a 010 190 zzo- 655 тїї 900 890 8€0- STSÁ[euy uioiieq 409v '% 
990 6S0- L10- Ez oso- 190 205 ZET Supuoseow отләшцзтлү 159y ‘¢ 
€£0- L0 990 911 180 oso- SSS 600- впотзезпйшод этләшүзтлү оду “z 
oso z10 ETI 860 210 801 901 91€ fxe]nqe20A pue Sutpeay 1999 'I 
as no Iv ZA d эң N ^ 
IIIA IIA IA A AI III I I PEEP 
203224 


(ѕәорті [euroap 99193 оз $әптрА тту) 


РОЧЗӘҢ [e2rude19 әц 44 uor3ngog ay3 то s3urpeoT лоор рәзғоұ 


a 0 


Do — 


I TISVL 


а 
214 PSYCHOMETRIKA 


of factors, and in order to make relative comparisons with the other pedes x 
an oblique solution was obtained graphieally from the orthogonal solu bd i 
The oblique solution, in which the factors are identified as verbal, кщч 
mechanical, perceptual, visualization, academic information, correct. English 
usage, and socio-economic background, is shown in Table 1 
of the angular Separations of the axes in Table 4. 

The apparatus designed by Zimmerm: 
board, T-square, and a triangle, w: 
eliminated the hecessity of conve 
after each rotation a 


, and the cosines 


The Analytical Solution 


Thurstone's analytieal procedure is 
coordinate hyperplanes are determined or 
started by selecting and normalizing te: 

' which Served 


a single-plane method since the 
пе at a time, The procedure was 
st vector 15 (Numerical Operations), 
аз а trial vector to find the first reference vector, "Test 15, as 
required in the procedure, has some relatively high and low correlations in 
the correlation matrix. The trial vector was then adjusted to a position Ar 
which is the normal that defines the first hyperplane, : 

The test vector 31 (Gray-Votaw Reading), which has a low projection 
on the first reference Vector Ar , was selected to become the second trial vector: 
(A projection was considered low when its absolute value was low relative 
to the rest of the test Projections on the same reference vector.) It was 
employed to determine reference vector Arr . The test vector 18 (Tool Func- 
tions) with low Projections on Ar and Ay, was selected as the third trial 
vector to find reference vector Arr. З | ч 

This process was continued until the number of reference vectors 80 
determined was equal to the number of factors in the cent 
The eight factors identified by the analytics 
identified in the graphi 


H t ns correct 
English usage and ac actors, 


: : ear to be independent 
the requirements for simple structure. 

by the anal tical solution js viven in 
Table 2 and the cosines of the ang Y deal. solution is giv 


es in Table 5, 
The Direct-Rotational Solution 


The method outlin 
solution. In this metho 
figuration is consistent; 
figure is a tetrahedron f 
the analogous model for 


5] was used to obtain the third 
18 made that 


i. ti a given factor con- 
with a specific geometrie model. The postulated 
or 3-factor prop] Ypertetrahedron is 
higher-dimensio The objective is 60 


м 


9£o 400 146 6£0 1to- 


eno £90- о mog ‘peg edoog ews “GE 
d 092 lot Ебо Јао єєт соо .— 92 660 Wy wog (ешле) (dujdoog-xopnb ST30 “HE 
тз кея то Géo 19т tlo- £60 fot Gee п шлої “QueqUOD Toouog ҶЭүн BAOL EE 
я TTO NBO hzo- 260- Ғе о ° Loo olr оттөшцурлу 'AwjoA-feap ‘zf 
ELE hEr THO Eto- 990 650 61n Соо Р. Buppeey (&wjoA-fe1o ‘TE 
Ezt тб гло £to- gto 200- Ёо Ezo- эрлом Jo өотоцо 'AwyoA-fw0 "oE 
992 чет co T90 600- оот- tet Gto өхлпїүзлөзүт (AwjoA-f9X) *6г2 
6 6gE 6t0 THO- гоо 200 16t 9tt вәтрпз5 TBTOOS 'AwjoA-fed) +92 
ene e6t ото 6lo 990 tot gee 000 eouepyog 'AwioA-Reuo ‘la 
TTO- 900- 992 650- gor IT Tho- eot- у uxog 'suoy4l9eH eoede-IWI *сг 
HTO- 190 9Co- ото HITE — Qto- Ero- тең V uxog ‘A9TT TAY l92T4eumy-IWI + Cz 
oe Ебу 161 соо Lto- Lro 600 бє тго- V Wiod ‘IT зләа ‘өЗеѕп oSenSuw]-IWI ‘tz 
zl [Дх] 064 zzo- 090- 9£0- 260- glr EOT V mog ‘I 97mg ‘ofesn eZgwnSuw[-IWI ‘Ez я 
5 сто ££0- LER Lot Lzo- 200 Со 292 V mog “үү ted ‘“TBOTIOTO-IW ‘tz 
РА гло тоо- uo 260 Eth 160- 210 220 y шлод “Эчтиовзәң q0wijsqV-LWI ‘Tz 
" 000 HOO ofOo- gzh 9to- го Glo- ссо 70939 “лозопдувиү “Адоушәлиү тзотцізлота *02 
a 99€ Lin 9S0- 9% 190 hto- EEG 2g0- VeO9IS “у uxog әЭрәтмоцу рлоң *ET 
" 960- тго Stt gio- 00 199 г90- 90- WHOEIA “виотуоиая тоор ‘QT 
8 900 тго 19 9#0- Lto 690 Fate) Cot VOl9dO “UOCTZBOTITIUEPT Jo peods +) 7 , 
a Lee 6TE 090-  6oo- LET 9zt ос teo VlO9IG ‘иотвиецедїшоо Эшүрзән ‘9T 
Za ого Lro сот 90- 100 9to- feo 19S 920110 "II ‘suotzesedo тзотдөшац «Gt 
5 Ono szo- LIT géo 96E 6ot- 660 Geo VOISIG 'exxwupuw] лој Aaouoy “HT 
“A өтт TET tot  tto- бе 26 te0 - бет- YEO6IG “вотатоштаа тзотчецоөц ‘ET ` 
B EET rEZ *^90- HOO goo- 686 got ho- VZOĆIT “вотизцоәң TRIOUOD “ZT 
a o6t TOE 160 гр Ge0- — w6t [42 eto VIOEIE ‘оотзчшлојит тзоүлуоөтя "Tl 
5 HTO- 600- обт өң 962 guo Lto чег утё9-\егоа« “Эчүрзөн өтер риз түт от 
E со get goo- Lot 260 110 Lot LEO VeOlI& *“влүеууу 3ue:xn) лој punougXowg +6 
» tet 662 nO 960 9гт "et HOE hzo- VIOT ІЯ 'uoT3wmioguI чоүувүлу +g 
á TSO £60 ETI- TIO LEE hlo 010 hle VIOZIg 'PujyuosweH отуөшцүүдү +), 
2 290 SLT eo ELE T00- 190 980 gt * . өдлгөд чү uop3wonpg +g 
a glo 6Gt гбг — это тот. 6£ 920 Gro z өрпаүайү тзотџецоөи бу 'G 
Z Lto- SGo-  Go2 го Len +60 olo- | oCo- STSATEUY Uioi49d 100у *h 
8 200 910 . Hel боо Ltr Ero 000 eot Зитиовзән отзәшцүрлү IOV E 
T90- HTO $10 [to Welz Oz0- | 060- 6 впотузупйшор oT3eumT1y оу *2 ч 
o6z Lre 6to- Со 290 ^to їч оо > fxspnqeooA pue Burpee 19V 'T 
— UR N07 ي‎ — . —u — a aam 
no IV d SS е 2л ән A M SOTQUFIVA 
IIIA IIA IA A AI III 


(C (‘soot т=шроәр eer оз sogpujue тту) 
роцзәй [voT4/lwuy о, әџозвлтці Ад UOTINTOS eu Jo sSuppso 103094 po2v3oH 


€. те 


= 8 r 
= ینف‎ 


EE Ө Шы uw 


216 PSYCHOMETRIKA 


locate the primary axes at the intersections of the hyperplanes. The ae 
of points at the intersections of the hyperplanes provide direction ашарез 
which constitute a basis for deriving the transformation matrix. Once these 
points are established, a single rotation is required to obtain the rotated 
loadings of the variables on either primary or simple axes. 

The test vectors of the factor matrix Ё 
of the first centroid loadings rather th: 
an effort to reduce distortion introdu 
values on the first centroid, The exten 
nations of pairs of axes. The plots wi 
of the eight primary 
as in the graphic 


. Were extended to the mean 
an to unity, as is usually done, in 
ced by extending vectors with low 
ded loadings were plotted on combi- 


academic interest, were still too 
: considered separate dimensions. 
A solution bas 


re pr $ then tried and accepted. This solution, 
which is presented in Table 3, satisfie 
of a factor configuration. 
tended vector confi 


1 orrelations of the faetors should be small enough 
to give evidence of their independence; and (3) the inverse of the inter- 
correlation matrix of the factors shou 


ое computed satisfactorily, i.6- 
ably less than unity. The cosines 
ге identified as verbal, numerical, 
and socio-economic background are 


Results 


factor configuration by the graphical method was 
pun a е during rotation specified that, 

actors achieved only tia „ееп 
Æ 20), it would be residualized and the peti ca Len rong dn 
eight. Tue correct English usage factor was not identified with c к тег 
although it did not residualize, The application of sim ^ with confi dia, 
and an inspection of a set of plots of the go. o te сй 


rotated factors sem. sa final 
check on the adequacy of the rotational solution, tors deed ваа 


The rank of the 
judged to be eight, ТҺ 
if one or more of the 


217 


BENJAMIN FRUCHTER AND EDWIN NOVAK 


[c -—— ÀüP ee eee, reme S 
S9 200 


о mog ‘pasg edoog BUTS ‘Gt 
261 let Wy mog ‘ameg (Bupioog-xoTn5 9T30 “HE 
gas #60- 0 £90 6га ETE т шлоз 'quoquo) qoouog u2IH злот 'EE 
29t NET 970 196 Geo этзөшцүүлү ‘лезол-Кәло gt 
960- ого 190 get 890 66у Suppeoy ‘meqoa-fezg ‘TE 
бет- THO Loo- Lat 190 quU SpXoM Jo өотоцо 'as3o0A-fexo ‘Of 
ETO ETO сто 6to- t00- GLH әлпүздәтүү '^4wioA-£fe19 "62 
£60 гло ETO- Ero géo gi seppmig 18—005 “мззол-Азл) *бг 
90 060 «460- ost оңо [173 eouepog 'AwioA-fei9 "Lz 
000 ont ото zzo zzo- GTO y wioà 'euoq4uw[eH eowds-IWI '92 
єт 892 6t2- 260- 6Е6 9to y шлод ‘AQTTTAY l9oT1eumN-IMI “Gz 
Oht- 900- Teo got Iu Q6t у шлоз “тү 7303 'e2usp eSuwnSuwI-IVI ‘tt 
GET- h60- Gto 660 99t 6% y шло ‘I злва ‘oBuen oZwnfuw[-IVI ‘fz 
nT o9t- © G20- ote 660- у uxoà ‘TI yaw 'i9oT49e[O-IMI ‘22 
eot L6t 060- TOT- LOT Glo у шдод ‘Buyuosuey 32€J38Qy-IMI “Tz 
29 080- ETO- 180 gut G60- 910939 “лоўоплувчү '£uoqueauT твотцӣзлдотя *02 
gto- Lro 920- 260 gto- ans veoola “у шлоя өЗрәтлочу рлом ‘6T 
220 gho LLo п96 2to- oet- ччо6бте ‘Suoma TOOL “QT 
Loo 800 6tt 400- 200- тоо уот940 ‹чотузотуүзчәрү jo peeds ‘LT 
ёто- 90t 960- Іл 9ot qnt VlO9IG 'uojyeueqoaduoo Surpweu *9T 
890 бпо- Llo Eho- 29S 900- OLIO “тї 'suoj4aiedo Teoyzoumy ‘GT 
gu ogz TTO ott- gut 980 VOlGIG 'sxiwupuwI лоу лоше * HT 
2£0- 462 Lro- T9E соо ото уєобтя ‘ветїтоитла теотчецоәд ‘ET 
TTO TTO Gho- 296 900 пот үусобтя ‘вотизцоек 19лөче) ‘tT 
gto Gto- [420] gui olo toz утобІя “чотуешлоучцү TBOFAISTA ‘TT 
9ut Lot 960 00- GRE oto- vi29-veeo4d ‘Burpee әтазї рие TUTA “OT 
eet 660 6£0- Leo 820 "Ut VeOlId ‘өлүзууу 1uoixn) ло pimoswyowd *6 
гот cgo q00- чөт 6£0- EEE үтоття ‘чоузешлојчт чотузулу *Q 
сот 9©г 96т- 690 тет то vrozig 'Sujuosweu оүзошцаүзу *L 
See 0S0- 900- Gu ete EGO SABOX UT чоўузопрд '9 
910 get GET 9nt 199 190 2 opm3rady тзотчецоәй OY, *G 
650 Qut 6to eto- olo 9to- sis£iwuy uioq4ud IOW ° 
пет [243 gzz- это Ебу 900 SujuoseeH оүуәшцұрлуү 00У ‘E 
qut єог Ezt- 660- 7 696 qG0- виоүзвупбшоо оузәшцурдү LOY *@ 
000 910 620 ott eit eot Auwpnqe2oA4 риз Buypeey IOOV 'T 
КЄ] 7A d oW N A вәтавтлвл 
Іл А AI III II I 
d03291 


(*seoutd т=шүоәр өәлцу o3 вөүлзчо тту) 


a - 
роцаон Толору тзон-320410 O £q UOFINTOS oua Jo sKurpso] 103094 рэзтдон 
i 


5 ғ grave 


218 PSYCHOMETRIKA 


TABLE 4 
Cosines of Angular Separations Between Simple Axes 
Graphical Solution 
(All entries/to three decimal places ) 
Enc = | 
I II тїї IV V VI VII " 
v N Me P Vz AL CU 
Roo ss Ma T шуы d = 
T 1000 
II -246 1000 
III 095 -040 999 
Iv 092  -234 106 1000 » | 
V -094 -138 -433 -143 1000 
VI -425 -056 -299  -258 -104 1000 
VII 006 -085 009 051 -120 -196 1000 
VIII -160 -043 063 009 098 -101 -036 1000 
E my u—————— 


Cosines of Angular Separations Between Simple Axes 


Thurstone Ana lytical Methoa 


(All entries to three decimal places) à 
I II ш 
тїї Iv ү VI VII 
N Y Me Vz SE P * AI "9 | 
T 1000 
I -296 ^ 1000 | 
III 111 -025 1000 | 
тү 006 -157 -236 1000 
ү 080 -059 051 152 1000 
VI 166 024 104 -148 003 1000 
VII -153 819 19k -hho on 096 1000 00 
утїї 2353 831 022 -135 144 125 872 E 


BENJAMIN FRUCHTER AND EDWIN NOVAK 219 


TABLE 6 
Cosines of Angular Separations Between Simple Axes 
Direct Rotational Method 


(А11 entries to three decimal places) 


II III IV у VI 
M N Me P Vz SE 
ي‎ ee 
I 1000 
II -494 1000 
III -033 -112 1000 
IV 086 -328 136 1000 
M -172 097 -467 -628 1000 
VI -520 278 -176 -263 213 1000 


BN ee uu ڪڪ‎ 


d the same eight factors. However, 


the application of the simple structure criteria placed the correct English 
usage and academic information factors in а doubtful category [3]. The 
similarity of the verbal, correct English usage, and academic information 
factors was indicated by а questionable fulfilment of the requirement that 
for every pair of factors there should be several variables with projections 


on one factor vector but not on the other. There was also a question as to 
whether the number of tests was small enough to conform to the requirement 
that for every pair of factors only a small number of variables should have 


appreciable loadings on any pair of factors. : 
Three separate solutions, based on the assumptions that the rank of 


the factor configuration was eight, seven, and six, respectively, were obtained 
by the direct-rotational method. The hypothesis regarding six factors was 
accepted because the solution produced a figure consistent with the postulated 
geometric model; the intercorrelation matrix of the six factors gave ше 
of their independence; and the invi orrelation matrix of the 


erse of the intere 
six factors was computed satisfactorily. А s 
The disagreement in the rank of the factor configuration may be at- 
tributed to the restriction imposed by the direct-rotational method in regard 
to the location of reference axes. The criteria were strict and tended to 
determine conservatively а rank characterized by highly idea 
On the other hand, the graphical method permitted рери куны 
i a e ; 
the reference axes of the verbal domain, so long as pl | rs 


criteria were not violated. The placement of reference ke yn factor was 
more objective by the analytical method. One M аер T lysis, by the 
located at a time, and the rank was determined, in the final analysis, DY 


The analytical solution identifie 


220 PSYCHOMETRIKA 


application of the simple structure criteria. The rank of the factor s i и 
tion seemed to be most precisely determined by the direct-rotational me a 
since criteria for the loeations and angular separations of the primary axes 
were more definitely specified by the additional requirements. n 

The comparison of the six factors common to the three solutions indicated 
no appreciable differences in approximating simple structure. The largest 
number of vanishingly small loadings (а; = 4.20) on each of the six factors 


was used as a discriminant, The graphical solution had a slight edge over 
the other two solutions in this respect. 


The solution obtained by the direct-rotational method was most ae 
nomical with respect to amount of time. It required 88 man-hours to reach 


a solution which was approximately 30 and 50 hours less than the analytical 
and graphical solutions respectively, 


The analytical method was least dem: 
on the part of the rotater. Л udgment w 
to serve as the initial trial vector for e 


consists of routine statistical ealeul 
of the direct-rotation 


anding of judgmental decisions 
as required in the selection of a variable 
ach factor. The remainder of the method 
ations. The major judgmental portion 
al method consisted of locating the intersections of the 
hyperplanes, This portion was concentrated near the beginning of the pro- 
cedure, with the rest of the method consisting of routine statistical calcu- 
lations. The graphi 


cal method required a judgment for each rotation of à 
pair of axes and was most demanding in this respect, 


Conclusions 
The direct-rotational method, in batteries for which it is suitable (i.e. 
batteries with a central factor on which 


tei 1 a all the variables have at least moderate 
positive loadings), provides a better basis for determini 
factor configuratio: i 


the present study e number of near 
rotational solution, 


r approximati а | : T 
analysis. Psychometrika, 1953, 18, 23-38. Pproximating simple structure in facto 
[2] Fruchter, B. Orthogonal and oblique soluti E AT: 
and background variables, Mnt reed of aptitude, achievement, 
[3] Fruchter, B. Introduction t, d e 1 | 
[4] Harris, C. W. Direct rotation to primary structure, J. e ostrand, 1954. 
468. 


BENJAMIN FRUCHTER AND EDWIN NOVAK 221 


[5] Harris, C. W. Projections of three types of factor patterns. J. exp. Educ., 1949, 17, 


335-345. 
[6] Harris, С. W. and Knoell, D. M. The oblique solution in factor analysis. J. educ. 


Psychol., 1948, 39, 385-403. 
[7] Neuhaus, J. O. and Wrigley, 
orthogonal simple structure. Brit. 
[S] Saunders, D. R. An analytical me 
Educational Testing Service Research Bullet: 
[9] Thurstone, L. b. An analytical method for simple struct 
173-182. 
[10] Zimmerman, W. S. A simple graphical method for orthogonal rotation of axes. Psycho- 
metrika, 1946, 11, 51-55. 


Manuscript received 9/28/57 
Revised manuscript received 1/6/58 


C. The quartimax method: an analytical approach to 
J. statist. Psychol., 1954, 7, 81-91. 
thod for rotation to orthogonal simple structures. 
in, 53-10, August 1953. 
ure, Psychometrika, 1954, 19, 


T—"—— Pr ro 
"чта P — € 


PSYCHOMETRIKA—VOL. 23, ХО. 3 
SEPTEMBER, 1958 


ANALYSIS OF VARIANCE FOR CORRELATED OBSERVATIONS 


RAYMOND О. COLLIER, Jn. 
UNIVERSITY OF MINNESOTA 
н e -dimensional 
ations exist between certain pairs of 


f correlated observations, classical 
de. The models 


and the split-split-plot design. In the first mo! 
observations of t| i i 


second model the observations of the levels o 
correlated to degree pi, wherens the observations 
correlated to degree рг. Analyses for the two model 
for various parameters are indicated. 
. In the social sciences it is generally agreed that the analysis of data 
involving several observations on a subject is of basic interest and importance. 
"Typically, several different variables are observed for every experimental 
subject, A special case arises when several repeated or multiple observations 
are made on a single variable for every individual subject, each successive 
observation usually being associated with a different level or combination 
of dimensions of the design. In both kinds of settings it is usual to assume 
that the observations associated with a single subject are correlated. 
Concern over such experiments involving repeated measurements was 
manifested quite early in statistics and the social sciences. Recent con- 
sideration of this problem has been made by several writers. Kogan [9] 
outlined an analysis for à design in which several groups of subjects were 
administered different experimental treatments with each individual subject 
being measured on several trials. Edwards [4] devoted a chapter to the 
analysis of designs which involved repeated measurements on the same 
subjects. Later, in surveying the status of experimental design in psychology, 
Kogan [10] pointed out the relevance of the split-plot design in experi- 
ments for which repeated observations were taken on each subject. Perhaps 
the most comprehensive treatment of the design and analysis of experiments 
involving repeated measurements for two- and three dimensional * "s 
has been given by Lindquist [12] in his chapter on mixed designs. - e: y 
example of the application of the split-plot principle ш ап (ден ыг 
drawing involving repeated o ated observations was ae т, ae 
Stunkard, ct al. [7]. Moonan in considering test rela ility 


кур Д :nvolved observations assumed 
experimental design, presented hunyo 1 


to be correlated. 
Perhaps the first pre 
correlations between obser 


r correl 
[13, 14], in 
a model whic 


s of the analyses of designs in which 


entation 
а assumed were made by Yates 


vations could be 
228 


PSYCHOMETRIKA 
224 


" it-plot 
[18, 19, 20]. In these Papers Yates presented {һе analysis of the 1 mm 
ПЕРА ws it applied to agrieultural experimentation. Later, Nandi 


5 of possible tests of E 
theses in split-plot designs. The work of Halperin [5] parallels that of Nandi, 


assumed by both writers being similan, 
ks ing the split-plot design and variations 
and Kempthorne [8]. In addition 


references pertaining to the ped 
plot design and the analysis of designs with correlated observations has bee 
made by the author [3]. | 


ative to the Split-plot design has been 
an the socia] sciences. The result has been 
sign and certain variations of it are not 
1 the social science | 
ually Presented, may be generalized to higher di- 
nly if a good deal of analysis of variance insight E 
. Finally, it has been Practically impossible to find 
Generalized designs Presented in the literature, let alone 


S. Moreover, the simple 


is paper to study a ck 
of the sort mentioned 


‘ss of designs in which data 
into the general Setting 


А rill fall 
analyses included will en 
8 of variance as applied to experimenta 
A Four-Dimensional System with Constant Cor, 
Levels of One Dimension 


It should be Sufficient for many investigations to consider layouts in 
which there 18 а ma 


` dimensions and equal frequencies in the 

subclasses. The dj i r to experimental treatments 
at limitations on the possible 
5 Imposed by scheduling, or the 
i multiple measurements 


relations Between 


| ) tion, Suppose th 
number of s 5 avai i 


Subject of a Sample Suppose furth ach 
1 4 T. r er that e: 
of these observations refers toan observation made under а, level of a particular 
dimension of the design. If the dimension 


is experimental, e 


imension 8, the tth level of 
1л imension to which 


RAYMOND O. COLLIER, JR. 225 


nations of r, s, and ё. The subscripts on the random variables, X,,,,, , will 
be simple to follow if q = 1,2, ---,Q;r— 1,2, ---, В; s = 1,2, = 38$ 
t= 1,2, °°°“, T; and u = 1, 2, --- , U. Thus, the QRST subjects, i.e., Q 
different subjects in each of RST subclasses or dimension level combinations, 
are measured on each of U levels of one dimension. 

As previously stated assume that correlations exist between pairs of 
observations taken on the same subject. One model which may be utilized 
assumes that the random variables X, аге QRSTU-variate normally 
distributed with variances and covariances as follows. 


(1) ИХ] = 05; 


COW [X uua Marre] 
=p, fr q— 4, r=", s=s', t=, usw, 
= 0 otherwise. 
Let the expected value of a single observation, given as a linear function of 
fixed constants, be identical to that usually expressed in Model I of the analysis 
of variance. 
(0) (Х,а) = u + a1, 7) + aQ, 9 + eB, 0 + alt, 0) + В(1, rs) 
+ &(2, rt) + 8(3, st) + BA, ru) + (85, su) + B(6, t) 
+ 4(1, rst) + (2, rsu) + (3, rtu) + y(4, stu) + d(rstu) 


where the fixed constants are defined as follows: 


и refers to the general effect; | FE 

a refers to the main effects of the different dimensions indicated; the 
numerals being employed to distinguish the main effects of different 
dimensions when specific values of r, s, t, or u are considered A 

B refers to the first-order interactions between the two dimensions 
indicated; TR 

y refers to {һе second-order interactions between the three dimensions 
indicated; and 

5 refers to the third 
dimensions. 


-order interaction constants between all four 


Since the fixed model is assumed, the conditions concerning the distribution 
of the Хы, are equivalent to 

Rave = E(Xeratu) + енә 
ts. The random components, Egret , IE 


9 effec 
o ed with 


where the e,,,,, refer to ra istribut 
assumed to be QR2ST U-variate normally distribu 
Elearstu) = 0 


(3) 
ОТ) == e 


226 PSYCHOMETRIKA 


and 


2 ii , + E^ , 
COV (вага з 6er) = po’, for а= 9,7= 1,8 = 80,1 = зи 
= 0 otherwise. 


In addition, the following parametric restrictions are imposed: 


Qus Des Mea уль 


a parscript involved is zero. 
ade on the variances and correlations state 
m observation X, is c^, the correlations 
between any two Observations made on the same subject are constantly p, 
een tivo observations not made on the same subject 

8 are stochastically independent. 

f an orthogonal transformation, to transform 
of variates which are independent 
within a set and between < i àve constant variances within à 
M set to set. This transform, due 


0. 


(4) 


(1) there are QRST(U — 1) variates 
ovis tite Uy Е Baath ppp s 
: Ey › е normall i > dis- 
tributed with constant Variance o? = г (1 ен Tan Das pul bi 
Zarau , it follows that in set (2) there are . атыш ате а | 
ur жек distributed with Constant variance оў = e + 1 £i 
which are also inde р Dor 7 — 1)рЬ 
of (4) is orthogonal tates in set (1), Since the transformation 


: К vs immediate] 
associated with the variates of set (1), the y dr es i^ Punt | A da 
525 «e, کے‎ Ту, 


oratu (и = 1 
(5) DEEE E Yon АР 
EIL M 


аташы — DUu. 


U — 4,0) В(4, ru) — 8(5, su) 


== 86, di) = YG, rsu) — YG, rtu) Ж 


Y, stu) — lrst), 


RAYMOND O. COLLIER, JR. 227 


and that the sum of squares associated with the variates of set (2), Zasu , is 
s ими» 


(6) >; >; x >; [ик ‚ж E(Z.s.r 
-vEEZI[Ée-&-an3-a53-38 
2, 22 22 2 U u — a(l, 1) — a(2, 8) — (8, 0 — BC, r9) 


— бо, rf) — 868,80 — Үй, zi 


where, in the above expressions, Хш. = Zs Хи and any sum expressed 
without limits of summation is to be interpreted as a sum over the entire 
range of the subscript. 
Noting that the par: 
statement is true of (6), 


ameters in (5) are involved there only, and a similar 
1 normal regression theory, e.g., as presented by 
Wilks-[17] or one of other equivalent techniques may be utilized to derive 
tests of hypotheses on parameters involved in the two variate sets. The 
quantities necessary to test these hypotheses are presented in the analysis 
of variance of Table 1 and the notation employed in the specification of the 


test statistics follows this table. 

Tests of hypotheses are carried out as 
split-plot design. That is, there is an error term for each of the two sets 
which is used to test hypotheses in a particular set. For example, the test of 
the hypothesis, Ме: 6(rstu) = 0, in set (1) is made by forming a ratio of the 


mean square related to the hypothesis and the mean square due to error (1), 


(7) F(8) = 9.8. [a(rstu)]/(R = 1)8 — De = 1)(U = 1) 
E S.S.(Error 1)/(Q — 1(RST)(U — 1) 

(Error 1) refer to sums of squares appearing 
is distributed as Snedecor's F distribution with 
— 1) and (Q — 1) (RST) (U — 1) degrees of 
ve, with a significance level of a, require, 
oint of this particular F distribution. 
ade by forming analogous 
[8] pp. 91, 375). Concerning 


is usual in the analysis of the 


where 5.5. [6(rstw)] and 5.8. 
in Table 1, F(8), under Ho , 
(R — 1) (5-0 (T = 1) (U 
freedom, Therefore, to test Ho abo 
as is usual, that [8] exceed the upper a-P 

Tests of other hypotheses in set (1) are m 


F ratios using the denominator as in (7) above (see [8] : f th 
the pooling problem the reader should refer to а discussion of some o e 


consequences of pooling nonsignificant interaction effects with error клу 
given by Binder [1]. Tt should also be noted that а testing eye y 
proposed by Hartley [6] seems promising here and elsewhere 1n t * ria 
"Tests of hypotheses on parameters involved in set (2) follow peti ps И 
gous lines. Thus, as an example, the test of the hypothesis, Ho : Y (1, rst) = 9; 
is made by means of the F ratio, 

.S. y, 0/2 
(8) F[y(1)] = x imr 2)/(Q 


= 1(8 = 007 = D. 
— 1)(RST) 


228 PSYCHOMETRIKA 


It is worth noting that the final result has been that of dividing the с 
sum of squares into two portions, namely the sum of Squares associated 


Nb Kerisi t з 159, es. ‚ U — 1, given by 


b» 25 2. >; Dno Й == N 2S 25 > X^ uuu, 
p s ч oF а 
(which is seen to be the Sums of squares between observations on the same 


individual summed over all individuals) and the uncorrected sum of squares 
between individuals, 


x = х У х... U. 


ОГ Levels of One Dimension 


Source of Degrees 
Variation of Sum of Squares 
Freedom 

Total 

Set(1) QRST(U-1) а! 

(Within 

individuals) 

a(4,u) U-1 S 

B(4,ru) (R-1) (0-1) P 

B(5,su) (5-1) (0-1) гаад 
йш шы ы 
7(2,rsu -1)(S-1) (u-1 Hr NM 
7(3,rtu) (R-1) (7-1) (у. Peri pee: 
7(4,stu) (5-1) (7-3) (0-1 &-h-j-keminro- 
5(rstu) (R-1) (s-1) (7-1) (0-1) a-b-6-d-e«fegih Legis fom-n-o£P 
Error(1) (9-1) (RST) (0-1) аа 
Total 

Set (2) MS bı 
(Between 

individuals) 

u 1 p 
a(1,r) R-1 1-p 
a(2,s) 5-1 т-р 
a(3,t) T-1 n-p 
В(1,г5) (8-1) (5-1) f-1-mip 
B(2,rt) (R-1) (T-1) &-1-п+р 
B(5,st) (S-1) (7-1) h-m-n«p 
y(I,rst) (R-1) (s-1) (7-1) b-f-g-htttmen-p 
Error (2) (Q-1) (RST) 


RAYMOND О. COLLIER, JR. 229 


Similarly, the error sum of squares in set (1) may be observed to be the sum 
of the interactions of individual and dimension u, whereas the error sum of 
squares in set (2) is simply the sum of squares between individuals in a sub- 
class summed over subelasses. The sums of squares associated with the various 
effects are computed in the usual fashion. 

The parameters listed in the “Source of Variation” column of Table 1 
should be interpreted as referring to the source of variation associated with 
the test of hypothesis on the parameters of the effect being considered. Thus 
a(4, и) refers to the source of variation connected with Ho : a(4, u) = 0. 

The mean squares have not been given in Table 1 since they are computed 
in the usual fashion. The expected mean squares for error (1) and error (2) 


may be shown to be 
?. (p and о = oll + (U = 10], 


gy = 


respectively. 
In Tables 1 and 2 the not 


EEEE X/A, b= DL LX /WU 


ation employed is given by 


(9) a= 
c= »» > > Xt, d= > »» E X*4/08, 
DDL Хм „/ОЁ, f= E X Х%-/0ТО, 
g= Y: D Х../080, = УУ X, /QRU, 

dh j= x X XS/QRT, 


t= У У Х%ы/@5Т, 
k= У У) Х°.../088, 


т = E Х2.../ФЕТО, 
Y х?.../0Е8Т, X?..../QRSTU, 


pepe eee л" к= PEE А ИНЬ 
=. х У E хы /TU 


A dot used here in place of a subscript refer: 
been taken over that subscript. | 

For the comparison of individua 
available for this task, €.» Duncan's mu 
iypical variances are given. 


1= У) Х?,.../08Т0, 


n = Y х°..../0880, 


o 
ll 

зЗ 

\ 


Q 
M 


s to the fact that the sum has 


use of one of the techniques 


ns by ; 
dual mew y nge test [11], the following 


Itiple ra 


230 PSYCHOMETRIKA 


a. The difference between two over-all means of dimension и, Ñ... — 


X....w , has a variance which is estimated by 


2 M.S. (Error 1) 
QRST 


b. "The difference between two means of dimension 1 for a given level of 


dimension и, X... — Xe, has a variance which is estimated by 


2100 — 1)M.S. (Error 1) + M.S. (Error 2)] 
QRSU 


The variances of differences between means of dim 
levels of u take analogous forms, 


c. The difference between two means of dimension u for a particular 
level of t, X... 


tu — Ж, , has a variance which is estimated by 

2 M.S. (Error 1) 

wc copi) 

QRS 

Variances of the differences between tw 
levels of other dimensions take analo; 
d. The difference betw. 
has a variance which is esti 


ensions r or s for particular 


о means of dimension u for particular 
gous forms, 


€en two means of dimension №, X ou cm ИТУЕ 
mated by 


Variances of the differ 
analogous forms, 

€. The difference b 
level of 8, X , — X 


reen means of dimensions 7 and s take 


etiveen two means of di 
sat. , hasa variance which 


2 M.S. (Error 2) 
T uror 2) 
QRU 


Variances of differences between means of dimension 1 for particular levels 
of r and also between two mea 


ans of other combinations of r, s and ¢ follow 
analogous forms, 


mension ¢ for a particular 
is estimated by 


A Four-Dimensional System with 1, Dimension Correlated p, and 
Levels of a Second Dimension 


Suppose that Q different subjects ar 


ow let all QRS s 
vels of dimensi 
der TU dimen 


each of RS subclasses 
ubjects be administered 
Оп t. Here each subject 
sion leye] combinations. 
of levels of dimension v 
er from the correlations 


— M 
—————— 
D ——S Ea 


— 


RAYMOND O. COLLIER, JR. 231 


between observations taken under different levels of /. For example, the 
levels of dimension might refer to various points in time at which each subject 


is administered all levels of u. 
For this situation let the model state that the random variables Xj, 


are QRSTU-variate normally distributed with 
(10) V(X) = oF; 
eov(X «unu; Хан) 


= pc, fo q= s du^ ust, 


= po, fr а= т, к=, s=s, HE 


иң. 
3 
1 

ES 
an 
Ш 


= 0 otherwise; 
апа 
(11) B(Xoretu) = the linear function given in (2). 
variates to independent sets by means of an ortho- 


Again transform the 
lied twice. First transform Хш, by means of 


gonal transformation app 


p = More е = Ene 


U 
3 Xen 


Venu = VU |^ 
ansformed by the same form of the trans- 


of (12) are tr h 
f which the Zarsru are labeled W,.,;; - Resulting from 


(12) 


Thereafter, the Уш 
formation to Zarstu , O i 

spon dX TOE OE E e You. (u = 
1, 2, ... , U — 1), normally and independently distributed with variance 
„2, و‎ 1 ч PAA кыЛ дле ек, 
с? =o" [1 — p]; а set (2) of QRS(T — 1) variates, ни ( = 1, 2, А 


i istributed with variance о; = o 

T es ` d independently distributed | 

Т a og Uol; and a set (3) of QRS variates, Worry , Мао, аге 

~ А y H B H = 2 ee 
normally and independently distibuted i adis жс 3 т 2 1 M : 
(т = ain these variates аг : r 

1), = DM сс of squares associated D set (1), set (2), and 

set (3) are given by (13); (14), and (15), respectively. 


u-1 " 
кэ ху М З 
р 


)- qQ, rsu) = yG, rte 


uel 


á Ж Юр (5, tu i) — y(4, stu) 


— érstu) | - 


232 PSYCHOMETRIKA 
T-1 " 
gos EEL 22 uso — BZ) 


-UEZII[Ée-A7ee-.50 005 


2 


— 8(8, si) — xa, ro | 
(15) Y Oy Weg EW gear) 


di: 7, УХ у. [== T H> о(1,т) — о(2, s) — а, | s 
As before, tests of hypotheses on parameters jy 


sla may be derived, Th 
follows in Table 2. 


In set (1) the test of the 
ó(rstu) = 0 


nvolved in the three variate 
ese tests are given below and the analysis of variance 


hypothesis of no third- 
, is accomplished by forming the P ratio 


= §8.[abem — yg — yep — 100 ~1 
(16) Pa 5-8. (Error 1700 — DESIU — D ' 
Other hypotheses in s 


po et (1) are tested by forming analogous F ratios, 
Again in set (2) we test the hypothesis, Ho :y(1, rst) = 0, by referring 
the ratio 


7 8-8. (1, ув — IS — рег 
1 КЇҮ(1)] = › X 1) 
vy X01 S.S. (Error 2/00 = HRS (rT — D 
to its proper 7 distribution otheses of Set (2) are tested by 
forming comparable F ratios. 


Finally, in set (3) the hypothesis, H, 
the F ratio 


order interaction, Mo : 


and other hyp 


o BCL, rs) = 0, can be tested by forming 


(18) 


E(M.S. (Error 1} = о = о? (1 — ру) 
E{M.S. (Error 2} = 8 = 
E{M.S. (Error 3)} = a 


U+ (U — 1p, 


p Up], 
= e 4 (U — Do, 


*(- a 


Analysis of Variance for a Four Dimensional Layout in Which the 
Observations of Levels of One Dimension Are Correlated 
0, and of the Levels of Another Dimension Are 


RAYMOND O. COLLIER, JR. 


TABIE 2 


Correlated Pa 


Source of Degrees 
Variation Freedom 
uc qaST(U-1) 
(Between "u" 

within "t" 

summed over 

individuals) 
a(4,u) LLL 
B(4,ru) (R-1)(U-1) 
B(5,su) (5-1) (0-1) 
B(6,tu) т-1) (0-1) 
7(2,rsu) (R-1) (S-1) (U-1) 
7(5,rtu) (R-1) (7-1) (0-1) 
7(4,stu) (s-1) (7-1) (0-1) 
6 (rstu) (R-1) (5-1) (7-1) (0-1) 
Error (1) (Q-1) (RST) (0-1) 
Total 

Set (2) qns (7-2) 
(Between "t" 

Summed over 

individuals) 
a(3,t) T-1 
B(2,rt (R-1) (7-1) 
B(3,st) (5-1) (7-2) 
7(1,rst) (R-1) (S-1) (7-2) 
Error(2) (Q-1) (RS) (7-1) 
Total 

Set (3) QRS 
(Between 

individuals) 

u 1 

а(1›г) R-1 
a(2,s) 


B(1,rs) t 
Еггог(5) (9-1) (RS) : 
ical differences i | 


The variances of certain tyP 5 of 
enans O! 


a. The difference between two M 
has a variance which is estimat 


5-1 
(R-1)(S-1) 


ed by 


Sum of Squares 


a!-b! 


-1-о+р 
-т-о+р 
-п-о+р 
-{-1-}+!+т+0-р 
-д-1-К+/+п+о-р 
h- jJ-k+m+n+0-p 


283 


a-b-c-d-e+f+gth+it j+k-1-m-n-o+P 


a!-b!-a«b 


pi-c! 


n-p 

g-1-n*P 

h-m-n+P 
b-f-g-hrfemen-p 


p!-c!-b«f 


dimension t, 


284 PSYCHOMETRIKA 


2 M.S. (Error 1) 
QRST 


А : > 
b. The difference between two means of dimension {, X |... — 
has a variance which is estimated by 


2 M.S. (Error 2) 
QRSU 


c. The difference between two means of dim 
has а variance which is estimated by 


A 


ension s, Жз. SR 


2 M.S. (Error 3) 

QRTU 
Variances of the differences b 
d. The difference betwe 


level of t, Xa. X, 


etween means of r take analogous forms. А 

; : Jinlar 
en two means of dimension u for a particula 
..tw' has a variance which is estimated by 


2 M.S. (Error D. 
QRS 


Variances of the differences between me 
levels of r and s take analogous forms, 


€. The difference between two means of dimension ¢ for a particular 
level of и, Xo a~ s, ina y 


arianee which is estimated by 
200 — 


1)M.S. (Error 1) + M.S. (Error 2)] 
QRSU ` 


f. The difference betwe 
level of u, X. nu — X 


2100 


ans of dimension u for particula 


en two means 


s А ticular 
of dimension s for a particula 
tu, has a v 


ariance which is estimated by 
— 1)М.8. (Error 1) + M.S. (Error 3)] 
QRTU j 


Variances of the differences | 
take analogous forms. 

g. The difference between 
level of r, X, — X 


etween means of r for a particular level of “ 
two means of dimensio 
и.) has a variance which is esti 
2М.8. (Error 3) 
= error 8) 
QTU 


n s for a particular 
mated by 


Variances of the differences between means of r for a particular level of 8 
take analogous forms. 


Systems of Lower Dimensi 
Due to limitations on sp 
cases of the previous systems 


onality 
ace it will not be 


3 КИТ 
4 Possible to examine speci 
wherein the numb 


: Bor esa, т. 07 
er of dimensions is less tha! 


RAYMOND О. COLLIER, JR. 285 


four. It is suggested that systems of lower dimensionality be viewed as cases 
of the preceding systems in which certain subscripts, say г, s, or t, have been 
deleted and Tables 1 and 2 be altered to take care of these setups. Thus, all 
parameters in Table 2 containing a deleted subscript will be omitted from 
analysis and the sums of squares and degrees of freedom which remain will be 
altered by striking out either the deleted subscripts or corresponding limits 
of summation as the case might be. Tests of hypotheses are made in the same 


manner as before. 
Problems in Interpretation 


In classification systems, whether the levels of the dimensions are- 
experimental or not, the difficulties encountered in interpreting the results 
of an analysis usually increase when the number of dimensions are increased. 
This is particularly true when it is desired to explain the meaning of significant 
higher-order interactions. Methods useful in the explanation of significant 
higher-order interactions (see [12]) for systems in which the observations are 
assumed to be uncorrelated may be used also in the setups presented in this 


paper. | | 

Normally it is necessary to compare individual means in order to explain 
both significant main effects and interactions, this being accomplished by 
tests of significance such as multiple range tests. In the previously presented 


models, however, this task becomes more complicated: since the analyses 
contain more than one error variance. Although the variances of differences 
between partieular means which previously have been уеп n fuam 
Such comparisons, it should be pointed out that the degrees 0 | ja n А or 
variances which involve more than one error mean square may be appro 


mated by means of methods given by Taylor [16]. 
REFERENCES 


[1] Binder, A. The choice of an error term in analysis of variance designs. Psychometrika, 
T." 
2] тогы с апа rimental designs. (2nd ed.) 
1957. — 
3] Collier, R. O., Jr. Experimental 
correlated. Unpublished doctora CI. 
[4] Edwards, A. L. Experimental design in 


1950, 
ression theory 1n the 


math. Statist., 1951, 22, 518 0 a siii od 
n of two methods of instru 


New York: Wiley, 


Cox, G. M. Expe 
designs in which the observations are assumed to be 


i ion, Univ. Minnesota, 1956. } 
, dieere agat research. New York: Rinehart, 


presence of intraclass correlation. Ann. 


variance. Commun. pure 


7. n in 
appl. Math., 1955, 8, 41-97. У й opui 
1952, 20, 265 7 Fork: Wiley, 1952. 


beginning drawing. J. 62р. "ments, New Y Ч 
t Kempthorne, О. The design s. Psychol. Bull., 1948, 45, 
9] Kogan, L. S. Analysis of уагіапе 


131-143. 


236 PSYCHOMETRIKA 


[10] Kogan, L. S. Variance designs in psychological research. Psychol. Bull., 1953, 50, 
1-40. 


[11] Kramer, C. Y. Extension of multiple range tests to group correlated means. Biometrics, 
1957, 13, 13-18. 


Lindquist, E. F. Design and analysis of experiments in psychology and education. 
Boston: Houghton Mifflin, 1953. 

[13] Moonan, W. J. Simultaneous examination and me 
J. exp. Educ., 1955, 23, 253-257. 

[14] Moonan, W. J. An analysis of variance method for determining the external and in- 

ternal consistency of an examination. J. exp. Educ., 1956, 24, 239-244, 

[15] Nandi, H. K. A mathematical set-up leading to analysis of a class of designs. Sankhyd, 
1947, 8, 172-176. 


[16] Taylor, J. The comparison of pairs of treatments in split-plot experiments. Biometrika, 
1950, 37, 443-444. 

[17] Wilks, S. S. Mathematical statistics. Princeton: Princeton Univ. Press, 1950. 

[18] Yates, F. The principles of orthogonality and confounding in replicated experiments. 
J. agric, Sci., 1933, 23, 109-145. 

[19] Yates, F. Complex experiments. Suppl. J. roy. statist. Soc., 1935, 2, 181-223. 

[20] Yates, F. The design and analysis of factorial experiments, Imp. Bur, soil Sci., Tech. 

Comm. No. 35, Harpenden, England, 1937. 


thod analysis by variance algebra. 


Manuscript received 5 /2/57 
Revised manuscript received 1 /17/58 


^" س‎ 
T r ——————À————————— [1 
———€— ——————R 


PSYCHOMETRIKA—VOL. 23, NO. З 
SEPTEMBER, 1958 


THURSTONE'S ANALYTICAL METHOD FOR SIMPLE 
STRUCTURE AND A MASS MODIFICATION THEREOF* 


RonznT R. SokanT 
UNIVERSITY OF KANSAS 


le structure proposed by Thurstone is 
und to yield satisfactory results. The 
method is found to match closely 
onds to the true structure of the 


The analytical method for sim| 
applied to four Кр. eases and fo 
simple structure obtained by Thurstone’s 


that obtained by other methods and corresp¢ ц r 
matrix in those cases where true structure is known. Difficulties about the 


choice of the correct trial vector led the writer to develop a modification of 
Thurstone’s method, useful where high speed computational facilities are 
available, Instructions are given for this so-called mass modification, and 
the procedure is illustrated with a 5-factor, 14-variable example. While the 
resulta do not fully correspond to a previous graphical solution, it can be 
argued that the results obtained by the new method show an improved 
simple structure. The modified method is applied to three other correlation 
matrices, yielding in each case a satisfactory simple structure. 


The purpose of this paper is: (1) to give several examples of the appli- 
sation of Thurstone’s analytical method for finding simple structure in a 
factor matrix; (2) to compare the results of Thurstone’s solution with those 
from other analytical solutions a5 well as with results of trial-and-error 
Y ation of Thurstone's method, 


graphic rotation; (3) to introduce a mass modifie › 1 
which is believed to be superior where high speed computational devices 


nre available. р 

The student or investigator entering 

time need no longer be dismayed by the prospect 0 

and-error rotation of reference vectors interspersec c х 

desk caleulator computation. A number of analytical solutions for simple 
authors. These met 


„= s determine 
structure have been proposed by Sey eral а em Š » em : 
a simple structure constellation by arithmetic rather than geometric operations 


and appear to have rem oved the trial-and-error element from the procedure. 


e Department of Ento! 


" ‚саран No. 961 from th Piit 
wrence, Kansas. А 5 is based was регіогте uring the summer 
TRIPS хор on vi tire eth Nem мүч aed a 
granted by "he зе auto University D t Association. nA spend the tenure of this 
at the University of Illinois where the author was um lege to et ehairman of the Psy- 
Scholarship. The writer is indebted to Professor ^ 3 equipment at his disposal, and to 
chology Department, who graciously placee. fing encour: d interest in the 
Professor Raymond B. Cattell for bis. continuing iri 

author's work. Some of the computations 

ended ! 


the field of factor analysis at this 
f seemingly endless trial- 
1 with hours of tedious 


mology, University of Kansas, 


Puer. Th ies ext tefully acknowledged. The 

т т А of the University 0! during the development of 

Writer is indebted to Mr. John R- Hurley fo К ding of this paper. Expenses In connection 
е computational routines and for a roh Grant of the niversity of Kansas. 


With the work were met by a Genera 
237 


238 PSYCHOMETRIKA 


Computational labor required for these solutions is generally quite heavy 
but a parallel development, the advent of high speed electronic computers, 
is eliminating this factor for those investigators fortunate enough to have 
access to such а machine. Thus the decision on whether to rotate to simple 
structure may in future be based mainly on the intrinsic merit of that structure 
for a particular research problem rather than on whether 
has the time or fortitude to undertake a sear 
traditional methods. 


Analytical solutions were published almost simultaneously by a number 
of authors. Carroll [1] proposed a solution based on the minimization of the 


sums of the cross products of squares of factor loadings. This method yields 
orthogonal or oblique factors, as desired. Saunders [8] published a solution 
based on a maximized kurtosis 


of factor loadings but with factors restricted 
to orthogonality. Pinzka and Saunders [7] extended this solution to the 
Neuhaus and Wrigley [6] developed another 
maximizing the sum of fourth powers of factor 
hese methods in the case of orthogonal factors 
n [3] working on an objective definition of 
: в. Tucker's method [13] is 
п | inear constellations of test vectors, 
Thurstone's analytical method [12], for simplicity’s sake hereafter 


referred to as TAM, uses a set of arbitrary weights which adjust trial vectors 
into positions perpendicular to such hyperplanes 


method is briefly reviewed in the following sectio 


discussed in Thurstone’s Paper [12], the writ 
published examples of the 

listed above, TAM арреаге 
mixed desk-calculator, digi 
his own data the writer tried i 


an investigator 
ch for simple structure by the 


a computational 


Point of view of using а 
efore applying TAM to 
Т matrices and compared 
ructure of the matrix or 


Subsequently developed 
When high speed compu- 
multiplication and inver- 


tion of this paper and four 
е given, 
*A mimeographed report by Rolf Bargm, Р 

lytical methods inr the determination of simple strasno А comparison of new ana- 
to the El the author after this manuscript had bean o piper 1953, was brought 
publication. I p) applied TAM to the 21-variable -facte eted and submitted for 
of TION f TAM bad Analysis. A satisfactory sim, Е Fo matrix from p. 402 
improvements o ased on Thurstone’s single эу. 01е Structure resulted. Several 
Bargmann in his report. Planing method were suggested by 


ROBERT R. SOKAL 239 


Brief Review of Thurstone's Analytical Method (TAM) 

TAM starts with an unrotated centroid factor matrix Го of dimensions 
ariables and & extracted factors. A trial vector is 
s of the study, and the row of factor loadings 
sposed into a column vector J; of direction 
en normalized to yield column vector P; , 
ction cosines. 

of normal-length test vectors 


n X k, representing n v 
chosen from one of the variable: 
of variable î in matrix Ё is tran 
numbers, Column vector J; is th 
which changes the direction numbers into dire 

Next the column vector of projections vi» 
jon unit trial vector P; are computed, 

0; = FP: 

according to the table of weights given 
to be described below, according to а 


ch reference vector to be determined. 
a diagonal matrix W of order 


A weight w;, is assigned to each vi; 
in Thurstone [12] or, in some cases 
scheme developed separately for ea 
"These weights form the principal diagonal of 
n. A weighted factor matrix Fy, 


Г, = WF. 


ymmetric matrix A is then obtained, 


А = FF. 


is computed and а 5; 


The next step is the computation of 


matrix is of order k. : 
lumn vector U, 


ompute co 
U-AP. 


ЕКЕ | 
This symmetric 

71 npo H H be 
AW}, This matrix is used to € 


s of the first reference vector. 
initial trial reference 
ference vector Via 


the direction cosine 
djusted position of the 
n the new ге 


U is normalized to yield Ai; 
The column vector A; defines the а 
vector P; . Loadings of the т variables 0! 
Are computed by 

Vj = F,A- 
m among variables that are in the 
third from those in the hyperplanes 
rth. Occasionally convergence 
Jow, and it will be necessary to 
t trial vector. In practice we 


iterate the pr e usin inuing as before. 
Start with چ‎ of Y. › ighting the former, nd adjusted into a 

As will bë seen below & trial vector ined (a disheartening experi- 
reference vector that has already been d t become apparent until the 
ence in the original TAM since it does at 8 been completed). This was 
tedious computation of the inverse matrix ha t of the mass modifieation. 
one of the eauses leading ир to the developmen 


is chosen fror 
vector, à 
and so fo 


A second trial vector 
hyperplane of the first reference 
of both the first and the secon’, 


i i bes 
toward the simple structure solution may ‹ 
A, aS the new unt 


240 PSYCHOMETRIKA 
й Examples of the Application of Thurstone's Analytical Method 
The Cattell’ Boys 


The first factor matrix tested is an artificial one simulating the results 
of manipulation tests of seven mechanical puzzles (v. 
is determined by three factors. It was dev 
in connection with their method of parallel proportional profiles, They 
prepared two similar matrices, one to represent the factor structure of boys, 
the other that of girls. Only the former was employed in the present study. 


The unrotated factor matrix Voa ([2], Table IIT) was used as the initial 
F, matrix. Variable 7 was chosen as the first trial vector, This yielded the 


TABLE 1 


ariables) where success 
eloped by Cattell and Cattell [2] 


Comparison of Simple 


Structure Solutions for Cattells' Boys: 
TAM, Graphic Rotati 


9n, and Original (Known) Factor Matrix 


Reference trix Original TAM Graphic 
vector VariabT 


rotation* 
I 1 60 [7 65 
2 -50 -h3 -45 
3 -10 -1} E 
: 00 -06 00 
90 92 92 
$ -60 -58 -65 
T 05 06 -01 
= -80 -Bh -83 
E 80 79 80 
50 he 45 
2 -10 06 01 
E: 00 -01 -01 
d eo 10 06 
III 1 ET EA FE 
- 10 05 1l 
I 20 02 -0l 
5 p -59 -63 
i 00 08 o8 
1 70 67 67 
30 80 8i 


tell and Cattell [2 
ES Of variables 2 and 3 for ali m ue 7 
vectors have been interchanged . Thi 


left half of table VI of that Paper. The present 
one according to a personal communi cat: 


version 
ion of Prop is the correct 


* R. B. Cattell. 


ROBERT R. SOKAL 241 


Cattells’ reference. vector III. Variable 3 had the lowest loading on that 
reference vector and was used as the second trial vector yielding in turn 
reference vector II of the Cattells. The final trial vector was clearly variable 
5, since it was the only variable located in the hyperplanes of both previous 
RYV’s, It yielded the first RV of the previous authors. The weights used were 
the original ones tabled by Thurstone ([12], Table 1). Since the initial factor 
designed to represent simple structure and consequently 


matrix was specially 
agnitude of the factor loadings it is not sur- 


exhibited a wide range in the m 
prising that the original weight scale was adequate. 

The result of the above procedure is shown in Table 1. The graphic 
solution and TAM are almost identical and both closely resemble the original 
factor matrix. Correlations between the TAM reference 
= 1136, гуш = 096, rin = .061) returning 
al orthogonal position. In this particular 
ly satisfactory results. 


structure of the 
vectors are very low (r.i 
almost completely to the initi 
instanee TAM appears to give entire 
° 

The Johnson-Reynolds Data 
The next example is a sma 
two other workers in demonstrat 


Il matrix which had been used previously by 
ing their own analytical solutions for simple 


structure (Carroll [1], Pinzka and Saunders [7]). The matrix had been originally 
computed from ten verbal tests administered to 113 students by Johnson 
and Reynolds [5] and contains 10 variables (the tests) and two factors. 
TAM was applied to the factors using variable 5 as a first trial vector. 
às remind Dt à d "milar to factor I of Carr Saunders. Since 
This res n a factor similar to factor I of Carroll and s 
мерена ariable the problem was iterated 


the hyperplane was determined by only one v ч 
using the V;4 column as à new V;, column. The second factor was found via 


variable 2 which lay in the hyperplane of factor 1. TThurstone's ve 
([12], Table 1) had to be modified somewhat since their original жаз 
would not have given enough Шо. roe die suggestion qm 
E ».. values be divided into : equi 
p. 180) that the total range of Vip V «signed values of from 0 for the highest. 
intervals and that these intervals be ppp was found to be quite 
i у for rest such los a 
absolute loading to 6 for the lowest 5 vas 1 ; 
«бъра ше such às Wright's chicken bones discussed the , Я 
range wis ns such as to warrant division into seven classes, and weight 
classes of from 0 to 4 were assigned this and pet pue p» 
Table 2 compares the findings by Lc "s 3 а ا‎ 
n by TA? mili 
на s [1,7 an be seen that the solutio TAM i 
me ip Qs i dn phas The iteration of factor I per the ора 
for ed ae considerably closer to the loadings found Py E " E 
su d U less one can agree on d common optimal erit e a dione 
uin it is = oesitile to compare and evaluate ine ш E «ра is 
s [7] sho lution is the better by the 
Ed b Data Le: value (Pinzka 


their solu The f and Saunders’ 
Carroll’s is the better by his criterion. 


242 PSYCHOMETRIKA 


TABLE 2 


Comparison of Three Analytical Methods for Simple Structure 
Applied to Johnson-Reynolds Data 


Vi 


lethod Carroll Pinzka and TAM 


vector eriable Saunders (lst iteration) (2nd iteration 
Ие Е Б сва ванын 
2 -099 -108 007 -091 
3 064 056 И ото 
4 315 303 шо 325 
5 638 627 139 646 
6 608 597 n2 616 
m 629 620 715 636 
Ж: 57} 568 638 580 
9 578 568 681 587 
10 281 267 426 292 
15 1 538 532 540 
2 436 L27 439 
265 85h 269 
| °% 23l 265 3 
5 -017 -ощ -006 E 
6 010 -017 021 Б 
F -066 -092 -056 i 
5 M3 -136 -105 
9 021 -005 032 
xi 309 227 359 


criterion) for the ТАМ 50 


lution is -0495, the Worst of у 

' the t ther 

hand the correlation between the reference vectors is — n * ida 
Saunders’ solution while it is only — 4862 for TAM "With ош id ing 
the other analytic bas s PIQUE 


al solutions it may be said that 1 : is- 
factory solution in this instance, TAM did provide a satis 


ROBERT R. SOKAL 243 


Wright's Chicken Bones 

This is a biological correlation matrix based on six measurements of the 
of 276 chickens, These data were first published by Wright [14], who 
analyzed them by a method analogous to factor analysis. His findings were 
revised in a later paper [15]. The correlations appear to be accounted for by 
three common factors [10, 14, 15]. These data have been rotated to simple 
structure from a centroid factor matrix twice independently by the writer 
ates. The first two columns of Table 3 show these independent 
as been adopted for the reference 


bones 


and his associ 
solutions, A uniform numbering system h 
vectors. Reference vector I appears to represent a head factor, while reference 


vectors II and III clearly represent wing and leg factors, respectively. From 
Table 4 which shows the correlations between reference vectors we note that 
the wing and leg factors are almost collinear. 

The analysis by TAM presented some difficulty. The use of Thurstone’s 
original weights with yariable U resulted in an essentially unrotated but 
reflected factor III rather than in factor II as expected. This difficulty could 
be overcome by recoding the weights based on the range of factor loadings 
as discussed in the previous section. An alternative procedure in this six- 
variable case was simply to array the v;, values by order of absolute magnitude, 
assign weight 0 to the largest value and 1 to 5 to values of decreasing magni- 


tude. 
When these revised weig 
vector obtained from variable 


hting techniques were applied to the Vip column 
U, the resultant A matrix could not be inverted 


during a first attempt on à desk calculator, using the cutomary Dwyer 
technique and carrying four decimal places. Investigation disclosed that the 
determinant of the A matrix was close to zero (|4 | = .013), apparently due 
to the high correlation between the wing and leg factors. When the inversion 
was repeated on the ILLIAC in 12-decimal arithmetic, a solution was found 
which finally yielded reference vector II of the graphic solutions. 

The next variable chosen was F, which is in the hyperplane of iim: 
vector II. It yielded previous reference vector ш. Again the Зза, 
of the А matrix was Very small requiring five-decimal arithmetic. І | a 
trial vector was based on variable B which had been in T pd 
both reference vectors found so far, This yielded reference ree 0 ^ e 
previous analyses. In view of the low correlation of this um ar 
with the oder two, e А matrix was consic erably larger 


the determinant of th t 

d zi ty. 

thin gero, аво Ts ec n pun similarity of TAM 
A glance at Table 3 will convince че. Б ; 

with des two graphic solutions. Table 4 shows the ccelis s 

solutions in terms of the correlations between pairs 0 E » ii e same simple 

spite of initial difficulties TAM again produced essentia y 51 

structure as the graphic method. 


944 PSYCHOMETRIKA 


TABLE 3 


Comparison of Simple Structure Solutions for Wright's Chicken 
Bone Data: Two Graphic Solutions апа TAM 


Reference Method Graphic Graphic TAM 
vector ма а n #2 

I L ho 47 48 

B 50 5h 49 

H 09 05 08 

U o2 02 00 

3 01 03 -01 

20 06 o8 Ol 

п L 01 09 ©з 

В -0} -01 -02 

d 27 32 29 

ы 33 33 35 

F -01 01 01 

"5 -02 ol o0 

III L1 -04 00 EX 

B 00 ou 01 

H -01 10 o2 

ы -04 10 -01 

т 30 ы 33 

* ?9 ho 32 


Explanation of variables: 


head breadth, H = humerus 


L = head length, B = 


length, U = ulna length, F = femur length, T = tibia length. 


TABLE 4 


Correlations between Reference Vect 
of Simple St 


tors 
ructure Solutions of Table 3 


dae ais ee 


rrelation Soefficientg 


TT TITT Ту 
Graphic #1 -.22 -.21 С 


Graphic #2 -.22 -.18 
TAM -.21 -.21 


ROBERT R. SOKAL 245 


Stroud's Termite Soldiers 

These data, published by Stroud [11], are based on a correlation matrix 
sical measurements of soldiers of 48 species of the termite 
genus Kalotermes. The correlations were factored by the centroid method; 
the unrotated Fo matrix is given in Table П of Stroud's paper. Five factors, 
extracted from the correlations, were rotated by a trial-and-error graphic 
method. The rotated solution is given in Table V of Stroud [11]; the plots 
of pairs of reference vectors are in Figure 3 of the same article. 

The writer obtained only one reference vector from these data by TAM. 
During the analysis of this example the idea of a mass modifieation came 
and it was decided to carry out the mass analysis of Stroud's 
The single trial vector used was based on variable 
the first trial vector should 
atively high correlations in 


involving 14 phy: 


to the writer, 
data reported in this paper. 
12, following Thurstone’s [12] instructions that 
be based on a test (variable) which has some rel 
the correlation matrix and which also has an appreciable number of relatively 
low coefficients. However, the reference vector obtained showed only & 
poorly defined hyperplane and did not mateh any of the reference vectors 
found by Stroud. This would appear to be the first failure of TAM to yield 
a reference vector corresponding to one obtained graphically during а first 
trial. A discussion of this case will be deferred until the discussion of the 
analysis of the Stroud data by the mass modification of TAM. 


A Mass Modification of Thurstone’s Analytical Method 


Description of the Procedure 


ation of TAM was developed in response to several 


the application of this method to the matrices 


these problems were the following. 
not yield a reference vector 


The mass modific 
problems which arose during 
of the preceding section. Some of 

(1) Occasionally a given trial vector would r 

1 sell- ;perplane. 
bs y^ poe уко variable would result in a well-defined refer- 
ence vector, but a second variable would have given an even clearer simple 
structure, It might be argued in such a case that iteration of thë ет 
vector from the first variable would also have yielded better de nition. 
However, it is of course desirable to limit the number of iterations to an 


essential minimum. b» 
(3) At times the prescribed procedure ior 


TAM produced two collinear reference s y d 
reference vectors in the study. а particular topic 15 
Stroud's termite soldier example below. а d 
A ificati f TAM, which w ill subsequently 

The essentials of the mass modification о i Wee ашалы 


be referred to as MTAM, are the simultan 


finding new trial vectors in 
lost one of the significant 
diseussed further in 


eous use of every 


246 PSYCHOMETRIKA 


vector. Thus from an n X k unrotated factor matrix, P, , where n is the 
number of variables and k is the number of factors, n trial vectors and con- 


Os 4 
sequently n reference vectors are computed, which however represent only 


k dimensions. The advantages of this method are as follows. 


(1) A uniform treatment of every factor matrix without specific decisions 
to be made before each reference vector is established. 

(2) The best definition for a g 
among those provided by several t; 

(3) A choice can be made 
since a A matrix for all n referen 


The advantages of MTAM are, 


iven reference vector ean be chosen from 
rial vectors. 


about the obliqueness of reference. vectors 
ce vectors will be obtained. 
however, predicated upon the availability 
of high speed computing machines since they involve for each iteration the 
computation of inverse matrices of order k. 

The computational procedure for MTAM is given below. 

(1) The procedure starts with the unrotated factor matrix Г, of order 
n X k. 

(2) F, is normalized b 
vectors P, . 

(3) Compute 


Y rows to Fy . This gives all possible unit trial 


Vs, FI, 
where V ;, is of order n X n. This yields 
column of V,, is a column vector Vip. 

(4) Weights are assigned Separately to each v, based on the magnitude 
and range of the Vi» Values as described above. A routine procedure which 
can be given to a clerk or built into a computer routine can be easily set up- 
The weights are set up as o X n diagonal matrices W, through W, . 

(5) Matrices W, through W, are postmultiplied by Fy . There are, of 
course, n such multiplications, which yield n matrices (of order n X k) 
Fri through Fy, . In general 


all possible column vectors v;, . Each 


Py; = W,;F,. 
(6) Premultiply Fw; by constant matrix PF; 


to give n symmetric matrices 
of type A, (of order % X k). Е зоне 


A, = Рр, 
(7) Compute inverses A?! forn matrices A, . 
(8) Split F^ into n column vectors of type Ff, . 
Then postmultiply the n inverse matrices by the Corresponding column 
vectors Fh, to obtain n column vectors U,, 


U; = AS. 


ROBERT R. SOKAL 247 


(Some workers with TAM have preferred to solve the simultaneous e uation 
AU = F% for U instead of the two-step procedure originally formated by 
Thurstone and followed here. The choice is entirely one of convenien : 
and availability of programs for a given computer.) ў boy 
(9) Assemble the n column vectors U; into 
normalize by eolumns to matrix A (of order k X n). 
(10) The final step consists of computing the n X n reference vector 


matrix Vj, , 


a single matrix U and 


Vis = F.A. 


The writer performed these procedures step by step simultaneously for 
seven different factor matrices (in about 3 hours computer time plus 3 days 
handling time, discounting errors made as the technique was being perfected). 
However, it would seem preferable to program the entire sequence as & 
single routine. This would then permit iterations. Machine output should 
include V;, and A (or preferably A'A). In the case of matrices of appreciable 
size, such as 50 tests (variables) and 10 factors, storage facilities on most 
computers would be inadequate to handle all the steps in a single procedure. 
In such a case the n X т weight matrices W, through Wn would have to be 
entered singly into the machine and steps (5) through (8) would have to be 
performed in sequence separately for cach trial vector. This would also permit 
dropping out any trial vector if the operator so desired. Subsequent procedures 
for finding optimal simple structure are discussed in the next section in 


relation to a concrete example. 

Data 

rmite soldier measurements by 
to the MTAM procedure 
d once as an experimental 
r the method had been 
the resulting V;a matrix 
f this second iteration. It 
vectors) based on 
ariables. Table 6 
alculated by 


The Mass Modification Applied to Stroud's 


The 14-variable, 5-factor matrix of te 

Stroud [11] mentioned previously was subjected 
just described. The matrix was completely processe 
problem to study the feasibility of MTAM. Afte 
found to be effective new weights were assigned to 


and the problem was iterated. Table 5 shows Уул 0 


gives the factor loadings (correlations with the reference 
the fourteen trial vectors corresponding to the fourteen varia 
twoen these fourteen reference vectors € 


shows the correlations be 
Съ = АА. 


ct from the fourteen reference vectors 
the end of an MTAM 


simply and elegantly 
ith a likely trial vector; 
wed by à variable 
АП that is needed 


"he i t problem is to sele 
hores de imple structure. At 


five which will yield а satisfactory 
analysis one is, of course, able 
Thurstone’s [12] origi 
then using а variable in the hyperpla 
in the hyperplane of both refer! 


248 PSYCHOMETRIKA 


TABLE 5 
VjA Matrix (Correlations of Variables with Reference Vectors) 


after Second MTAM Iteration to Stroud's Centroia 
Factor Matrix for Termite Soldiers 


Reference vectors 
Variables Жы ышты LL REALI VA ie ——— 
ot Tum m an mm 
i 32 11 07 21 -10 32 11 00 07 17 
2 1} 57 se o8 i -05 0% 02 05 о 
00 62 6h -06 og 15 -08 оо 05 


3 
4 2З 01-01 33 20 16 -17 св 39 32 «43 08 3 о 


10 21 00.01.39 29 ig 

ч OS ой а ы oe 17 27 Ob 32 oh oh ob 

ag 08-01-08 18 оз o 9* 28 02 34 o 49 19 -18 

13 23 -10 -12 25 10 05 -03 o2 Oh o OT o8 27 18 

Б Bh 02 оз 10 و‎ 70 00-09 i3 ay 18 -2 ih ho 
Number of 


variables 
in hyperplane 8 8 10 6 


6 8 8 7 5 


proceeds somewhat rom s me 
soon fail to арреа 

becomes available. « 

Thus the outcome of TAM 


the origina] TAM which 
се vector to reference vector will 


’ of possible RV choices 


M computational pro- 
ble added information, 
all possible reference 
hese vectors. It, should 


including the number of va 


riables in ¢ : 
vectors as well as all mutua] correla 


tions between t 


| 
| 
| 


249 


ROBERT R. SOKAL 


00"т 


үт 


Ес = lp NE’ zo- оң'- zz P= gb рез" ge qe 
Ot ES ge 06 g= age i£- og- Ere 96  GC- Gy- 
OT 87  Sgr- 82° 9" бс  6v- ga pes ge* Jus 

00°Т hg- 66" 68 e ot- op- hg- 50" бо°- 

00°т 06°- w6'- oL- ңа ор бє T- LE- 

001 16" SL oft- om- gg- ег 90* 

(ot 69" Bes Gt- o6- Jer ae 

O0'l wg'- 06°- ңс"- mo:- io: 

Oo't 96 so ңг er 

O0'l 02" 90° no- 

OO'l q- ot:- 

oo't 86° 

00°T 

ET ёт TT OT 6 8 9 а t t г 


S703292۸ 230ua1i3JoH 


әтаетдвд 


-— 


6 ATQBL JO (s103234 эгиоләјән uganjaq 


9 яту 


SuOT49[2.110)) хтів H5 


250 PSYCHOMETRIKA 


be possible from this information to develop a procedure leading to a unique 
and optimal set of reference vectors. 

Considerable time and effort have been spent by the writer on developing 
a simple "cookbook" procedure for obtaining an optimal simple structure 
using V;, and C; of MTAM. Regrettably, no such simple routine valid for 
all investigated examples could be devised. However, a series of steps may 
be outlined which in the cases tested has yielded satisfactory results. 

(1) Examine C; (and also check 1, 4 and А) for pairs or groups of highly 
correlated (| r | 2.8) reference vectors. These are usually based on closely 
related variables (in psychological matrices tests measuring nearly identical 
psychological functions). Some method of cluster analysis of C; is helpful 
in this procedure. In the matrix of Table 6 the following nonoverlapping 
clusters could be isolated considering only correlation coefficients larger than 
0.8: 2-3, 5-6-7; overlapping clusters: 1-4-13, 4-8-9-10-11, 8-13, 10-13. 
The variables in a given cluster usually represent а certain reference vector. 
'The ehoice of which variable is to represent the factor is made at a later 
stage, except in cases such as those described below where the reference 
vectors in a cluster are essentially collinear (e.g., reference vectors О and P 
in the housefly data were correlated at r — .999). In such cases it is, of course, 
immaterial which reference vector is used to represent the cluster. 

(2) Count the low correlations (| r | < .2) of each reference vector with 
other reference vectors (see Table 6). Below is shown the code number of 
each reference vector followed by the number of vectors with which it shows 
low correlations: 


1-1 8-0 
2-5 9-3 
3-4 10-0 
4-1 11-4 
5-3 12-2 
6-3 13-1 
7-3 14-2 


In some cases, such as Wright’s chicken bones discussed above, correlations 
as low as .2 do not exist. In such circumstances the definition ala Tos correl- 
ation must be altered to permit the formation of groups of relatively un- 
related reference vectors. | i 

(3) Count the number of variables in the hyperplane of each reference 
vector (factor loadings between —.10 and +.10.) This has беп done for 
the Stroud matrix at the foot of Table 5. | 

Now proceed to select the reference vectors for a simple structure i 
this matrix. Our eventual criteria for simple structure will be the fiv 
Jisted by Fruchter ([4], p. 110), namely: 

(a) Each variable should hav: 


e requisite 


e at least one loading close to zero. 


ROBERT R. SOKAL 251 


(b) "There should be, for each faetor eolumn, at least as many tests with 
zero loadings as there are factors. 

(c) For every pair of factors there should be several variables with 
projections on one faetor vector but not on the other. 

(d) For problems having four or more factors, a large proportion of 
the variables should have negligible loadings on any pair of factors. 

(e) Only a small number of variables should have appreciable loadings 
on any pair of factors. А 

However, attention is first directed to securing a large number of zero 
or near zero loadings for each reference vector adopted and low correlations 
between reference vectors whenever possible. 

Array all of the reference vectors in accordance with their suitability 
with respeet to the number of variables in the hyperplanes of the reference 
vectors: 3, 8, 1-2-6-9-11—12, 13, 4-5-7-10, 14 (ef. bottom line of Table 5). 
When examined for the largest number of low correlations with other refer- 
ence vectors the following array by suitability resulted: 2, 3-11, 5-6-7-9, 
12-14, 1-4, 13, 8-10 (cf. Table 6). 

Checking back with the high correlation clusters of reference vectors, 
only one representative each of the following two clusters should be chosen: 
2-3 and 5-6-7. Reference vector 2 was chosen over 3 arbitrarily since the 
two are almost collinear. Reference vector 6 seems slightly more suitable 
to represent 5-6-7 since it has two more variables in its hyperplane than 
5 or 7. Reference vector 1 was chosen to represent, 1-4-13 by similar reason- 
ing. In cluster 4-8-9-10-11 the choice fell on 11 as the best representative 
by both criteria rather than 8 which, while better by the first array, was 
much worse by the second. Variables in the remaining overlapping clusters 
(8-13, 10-13) having been represented already, the fifth reference vector 
was chosen from variables 12 and 14 which had not entered into any cluster. 
These two reference vectors were quite highly correlated with each other 
(r -— 72); 12 was chosen over 14 by referring to the suitability of arrays. 
It is admitted that this procedure laeks a unique solution, but the sub- 
jective decisions required are usually between quite similar alternatives. 
Thus the result would not have been appreciably different if reference vector 
5 had taken the place of 6, 3 had been in place of 2, 8 in place of 11, etc. 

Table 7 shows the simple structure reference vector matrix adopted by 
graphic methods. Only reference vectors 2 and 6 are clearly identical to 
Stroud’s E and D, respectively. Reference vector 12 is a poor representative 
of Stroud’s B, while vectors 1 and 11 together determine the covariance 
accounted for by Stroud’s A and C. The question may therefore be raised 
whether MTAM has achieved its goal in view of discrepancies with Stroud’s 
solution. However, the reader may convince himself by inspecting Table 7 
that the MTAM rotated reference vector matrix shows excellent simple 
structure. Plots of the reference axes against each other (not shown) confirm 


252 PSYCHOMETRIKA 


TABLE 7 


Stroud's Termite Soldiers: MTAM Solution compared with 
Stroud's Graphic Solution 


nT ee 


Variables Reference vectors 


1 u -12 09 оц 
2 14 51 -05 -08 16 
3 00 62 15 05 -12 
h 23 01 16 -08 08 
5 -07 -10 ho 12 -01 
6 -07 14 47 -01 02 
T7 07 00 -08 6 17 
6 -07 06 ol Po 36 
9 05 -01 00 41 01 

10 21 pi 18 -0l, oh 

li oT -03 09 32 -04 

* 08 -01 01 00 ho 

13 23 -10 05 oT 08 

14 24 02 05 


30 0 10 
3 i 98 26 -o6 57 
D 6 bs -01 09 52 
5 an p 3e 22 05 
6 9 93 -03 hio -12 
1 p p Ol 46 12 
8 52 26 ob -u -0l 
23 38 02 -01 -06 
2 д 23 -06 -0l -10 
e *5 28 2h 2 

n 28 10 0 
12 д -02 -07 „її 
-01 28 30 ol 09 

13 28 03 25 

ИЛ ho лі 08 10 -08 


of times until Stability of V, 
ence vectors would h 


ROBERT R. SOKAL 258 


TABLE 8 


Correlations between Reference Vectors of Table lá 


MTAM 
Reference vectors 
1 2 6 11 12 
1 1.00 
2 -.39 1.00 
о - 47 +13 1.00 
1l -.51 -.03 -.28 1.00 
12 -.42 33 -.18 .18 1.00 


Stroud's graphic solution 
Reference vectors 


A B с р Е 
А 1.00 
В - 47 1.00 
[9 43 К 1.00 
р -.38 -.h9 01 1.00 
Е -.62 26 .05 03 1.06 


to the number, X, of factors in the matrix. In such a case the choice of the 
correct reference vectors would, of course, not present any problem and 
advantages (2) and (3) of MTAM given above would vanish. Until MTAM 
has been programmed on a computer (to permit rapid iteration) this suppo- 
sition cannot be tested. The single iteration performed in connection with 
the MTAM study of the Stroud data yielded neither conclusive nor even 
suggestive evidence on this point. 


The Application of М ТАМ to Three Other Matrices 
i i i trices 

iraphi i le structure are unavailable for these та ; j 
Graphic solutions for simple a ore 


so that the success of MTAM has to be judged by tl к 
structure mentioned above. Тһе biological plausibility of са. пее 
obtained is not discussed, being beyond the stated purpose d e 
These matrices and others will be discussed in detail in several publica 


in preparation. 


254 PSYCHOMETRIKA 
5 


TABLE 9 
MTAM Solution for Correlation Matrix of 1h Morphological Characters 


in Houseflies. Simple Structure Matrix Above, 
Correlations between Reference Vectors Below 


M‏ ص ڪڪ 


Reference vectors 
Variables Tg 2 #2 
A 3h 05 19 -03 32 i 
5 46 12 00 01 25 -05 
С 48 -03 Di 00 -09 01 
р шщ 17 19 -01 -03 03 
E 12 30 17 03 -03 -0l 
F 02 45 -03 -04 -05 -02 
G -01 00 ho 00 o8 -05 
J -04 03 -04 6l -02 05 
K 03 -04 o2 60 00 -05 
L -0% 08. 22 -01 26 05 A 
M 09 07 16 10 27 ol 
N 06 -07 17 -04 ll -03 
° 05 -05 -01 01 02 77 
P -04 Ol 00 -01 -03 78 
Reference Reference vectors 
vectors I TI III IV у VI 
1 1.00 
II -.32 1.00 
III .03 -.67 1.00 
ІУ 05 -.06 02 1.00 
Y .02 -.39 лг -.02 1.00 
VI -5 .02 -.05 -.17 =.13 1.00 , 
Table 9 is based on an MTAM analysis 


of an orthogona] factor 


six factors and fourteen morp aracters in 49] вокей) 


matrix of 
es, Musca 


hological ch 


ROBERT R. SOKAL 255 


TABLE 10 


MTAM Solution for Correlation Matrix of 18 Morphological Characters 
in the Aphid, Pemphigus populi-transversus. Simple Structure 
Matrix Above, Correlations between Reference Vectors Below 


Reference vectors 


Variables 

I II TII IV у ут VII 
B 25 28 Ok 13 oh -02 -06 
C 42 02 oT ol -05 -01 -17 
D -01 53 06 08 -11 -04 10 
E 37 3h -06 o7 ol 02 ok 
F 17 46 -03 -0} -01l -0} -01 
G 28 15 05 27 05 -07 -05 
H 06 28 06 16 13 08 -06 
I 09 02 -02 3h -01 -11. 02 
L * 02 -0h 43 07 37 -01 00 
M 08 оъ 54 01 09 -08 -07 
N 03 -03 21 3h -08 -01 11 
о 02 -03 -04 48 -05 -07 40 
P -06 -01 08 -09 43 02 оњ 
Q 06 00 21 -05 17 07 12 
R -10 01 05 10 -15 32 ol 
S 08 -01 -09 -02 oT 52 -02 
T -12 00 -03 oT 20 -08 1% 
U -19 07 -06 08 14 -01 32 


s‏ ———— س 


Reference vectors 


Reference 
1 1.00 

11 -.22 1.00 
TII -.19 -.0h 1.00 

IV 42 -.66 -.27 ко б 

ы -.18 -.02 10 -. Р 

vi 213 .00 -.26 b 18 1.00 " 
VII -.56 13 -.12 .30 .23 -.13 Xs 


в to be a satisfactory simple structure as seen 
rt of the table and the correlation 
lower part. The simple structure 18 


domestica. The solution appear 
from the factor matrix in the upper ра 
matrix between reference vectors m the 


interpretable from the biological point of view. | ay 
T able 10 shows the result of MTAM on an orthogonal factor matrix of 


se le 0o ds maradnal barat in 4 aid a o 
Species Pemphigus populi-transversus. The origina E oe dn Again а 
moment coefficients based on covariances ome siot гош the statistical 
solution appears to be a satisfactory simple structure 


and biological points of view. 


256 PSYCHOMETRIKA 


TABLE 11 
MTAM Solution for Q-Type Correlation Matrix of 4 Genera of E 


in the Tribe Osmiini. Simple Structure Matrix Above, 
Correlations between Reference Vectors Below 


Variables Т сс течны LL oo ———— 
(species 


ees 


code . 

numbers} I H шы 2 
4 z 06 00 sem 
5 66 -06 -03 09 
8 2 -02 01 12 
26 oT 22 -09 ы 45 
35 -03 75 -02 12 
36 00 82 05 90 
ho -06 -04 -02 то 
50 08 -03 26 hl 
67 -01 01 63 -01 
68 -01 01 88 -02 

M Reference ү ectors 
1 Il III Iv 

I 1.00 


The simple Structure obtaine is ver i 
vectors represent Senera of these bees and th ret The € 


[9]. 


ave been Published elsewhere 


ROBERT R. SOKAL 257 


If we may generalize from the four matrices described above, MTAM 
appears to provide entirely satisfactory simple structure solutions with a 
minimum of (subjective) decisions required from the investigator. 

REFERENCES 

1] Carroll, J. B. An analytical solution for approximating simple structure in factor 
analysis. Psychometrika, 1953, 18, 23-38. 

2] Cattell, R. B. and Cattell, A. K. S. Factor rotation for proportional profiles: analyti- 
cal solution and an example. Brit. J. statist. Psychol., 1955, 8, 81-91. 

3] Ferguson, G. A. The concept of parsimony in factor analysis. Psychometrika, 1954, 
19, 281-290. 

4] Fruchter, B. Factor analysis. New York: Van Nostrand, 1954. 

5] Johnson, D. M. and Reynolds, F. A factor analysis of verbal ability. Psychol. Rec., 
1941, 4, 183-195. 

6] Neuhaus, J. O. and Wrigley, C. The quartimax method: an analytical approach to 
orthogonal simple structure, Brit. J. statist. Psychol., 1954, 7, 88-92. 

7] Pinzka, C. and Saunders, D. R. Analytic rotation to simple structure, II: Extension 
to an oblique solution. Educational Testing Service Bulletin, RB-54-31, 1954. (Multi- 
lithed) 

[8] Saunders, D. R. An analytical method for rotation to orthogonal simple structure. 
Educational Testing Service Bulletin, RB-53-10, 1953. (Multilithed), (also Amer. 
Psychologist, 1953, 8, 428. (Abstract)) 

[9] Sokal, R. R. Quantifieation of systematie relationships and of phylogenetic trends. 
Montreal: Proc. Tenth Internat. Congr. Entomology, 1958, in press. 

[10] Sokal, R. R. A comparison of five tests for completeness of factor extraction. 1958, 


(in preparation). 

[11] Stroud, C. P. An application of factor analysis to the systematics of Kalotermes. Syst. 
Zool., 1953, 2, 75-92. 

[12] Thurstone, L. L. An analytical method for simple structure. Psychometrika, 1954, 19, 
173-194. 

[13] Tucker, L. R. The objective definition of simple structure in linear factor analysis. 
Psychometrika, 1955, 20, 209-225. 

[14] Wright, S. General, group, and special size factors. Genetics, 1932, 17, 603-619. _ 

[15] Wright, S. The interpretation of multivariate systems. Statistics and mathematics in 


biology. Ames, Iowa: Iowa State College Press, 1954. 


Manuscript received 7/29/57 


Revised manuscript received 12/24/57 


3 


PSYCHOMETRIKA—VOL, 23, No, 2 
JUNE, 1958 


ATTENUATION AND INTERACTION 


QUINN MCNEMAR 
STANFORD UNIVERSITY 


A significance test is proposed for determining whether a correlation coefficient 
is less than unity by an amount greater than that attributable to errors of 


measurement. 


It is the purpose of this note to point out a connection between correc- 
tion for attenuation and interaction in the analysis of variance. Let Ху 
stand for the mth parallel measure of the ¿th individual on the tth test, with 
i= 1, ++- ,N;t = 1,2;and m = a, b. In order to give the problem an analysis 
of variance setting it is necessary to have all scores in comparable form. A 
common metric will result if, by using the respective means and sigmas, all 
four sets of scores are separately transformed to standard scores, with mean 


TABLE 1 


Score Schema 


Test .— — E 
1 2 
Measure Measure 
a b Mean a b Mean 
Individual (ч) (v) (w) (x) (y) (z) 
х x 
a Xia Xy Sij. X124 Xap 12. 
X x 
Ё Xala Xp X. X523 X22b 22. 
E X ila X iib Xi X i24 X i2b i2 
X x 5 Xyz. 
N ХА Xuib Хм N2a N2b 
ЕА = ET. = = m x =X 27 0 
Mean Xo PU qud Жы -2a 2b : 
x x LO t, m 
ж. # б жо Op P £i X x sb. 


260 PSYCHOMETRIKA 


of zero and variance of unity. Thus regard X;,, as a standard score. Me 
necessary transformations, which will not lead to any loss in generality, wi 
permit certain simplifications in the sequel. | 
The 4N scores, along with possible means, may be set forth as in Table 
1, from which it is readily seen that the setup corresponds to a three-way 
analysis of variance. Keeping in mind that with standard scores certain 
means (as indicated in Table 1) will be Zero, we may write out the analysis 
of variance depicted in Table 2. The degrees of freedom add to 4N — 4; 
that this is correct may be inferred from the fact that for each of 


TABLE 2 


the four 


Analysis of Variance 


ШЫ p ج ج‎ == 


Source Sum of Squares 


d.i. Var. est. 


Individuals 4z(X. y N-1 s^ 
p^ be 
Tests Vanishes 0 -- 
Measures Vanishes 0 =m 
TxM Vanishes 0 2n 
tx т 2 x. x 2 E 
22. X) NA i 
IxM ?EZE(X. -X y ^ 
im i-m x.) N-1 Sim 
Ix TxM Exz(X. -Х. -X X. y 
it e dE у Ж. PES зга 
pcc cH RE аср 
Total E 2 
Ee, 4N-4 


strictions on the deviatio 

Now if true Scores were available on the two tests and i 
were perfectly correlated, the indivi i 
zero. With fallible scores t 


1 would be exactly 
correlation between X a. and X 


E attenuated; that is, the 
i2. УОП e less than unity | unt 
attributable to error (of measurement), hence the 7 Pec iq 


Sieni ; X T interaction would 

not be zero, Significant Г x T interaction, when testeq against the Т x тх М 

interaction as error, would mean that the correlation corrected for attenuation 

is significantly less than unity, whereas insignificant 7 XT interaction would 
lead to the acceptance of the hypothesis that the tw А 

i “= VO tests mes i eal 

functions. S measure identi 


=». 


QUINN MCNEMAR 261 


For the tedium of converting to standard seores and doing the required 
computations for an analysis of variance, the tedium of computing correlation 
coefficients, some of which would ordinarily be needed anyway, may be 
substituted. The a and b measures on each test may be regarded as scores 
on parallel halves of the tests, hence each ra, would be a split half reliability 
coefficient (not stepped up), and the correlation between the averages, 
X4, and X5, , will be precisely the correlation between the underlying sums 
(of two scores based on halves), i.e., the correlation between the two tests. 

Let us now consider the three-way interaction. It сап be shown ([4], pp. 
2906-297) that the J X T X M interaction sum of squares reduces to 
(1) i Ж, [а = Xa) — (Xia, — Xia), 


which is a function of the difference between parallel measures on test 1 and 
on test 2. Cumbersome subscripts may be avoided by setting 


Xa. = U, Xin = 0, . =W, 


b E 


Xi = 2, Xin = у, Xin. = 2. 


In what follows it must be remembered that u, v, z and y are in standard 
score form; w and z are not standard scores. 
Expression (1) may be written as 


(2) {Ж (а 0 + Xc-w-22-2s-2) 
which becomes 
(Note, + Not, — 2Nrue + 2Ntw + 2\т„ — 2Nr.,) 


(3) а t (1 — n) 4-20 — т) — Bue H 2n, + 20. — 2] 


= M (2 — ru — Tey — Tue H Tw P 7 T»); 


as the value of the х T X M interaction sum of squares. 
Turning next to the J X T interaction sum of squares, note that by 


assigning to ¢ the explicit values, 1 and 2, the sum of squares may be written as 
2Y Sa. — KI 2X. = XD 

which by a simple procedure ([2], p- 246) reduces to 

2j (Xa. = а) ог È (w yn 2 


in simplified notation. 'This becomes 


Уа? + L-2} w, 


PSYCHOMETRIKA 
262 


and since w = (u + v)/2 and z = (x + y)/2, 
2)4 EH - < )چم‎ 
DE DF +o Уз Lert уу 
+2 Day – 2 Dux - 2 Dw - 2 Dor — 2 > vy)- 
Each term in the foregoing involves either N times à variance (of unity) or 
N times a correlation coefficient; hence 


(4) Xe шш шалы же 


Tis =a 
is the sum of squares for the I X T inter 


The six r's called for in (3) and (4) can, of course, be computed without 
transforming to standard Scores. These r’s are required in Yule’s correction 
for attenuation formula ([1], pp. 209-210). 

The value of Tuz , Which ordinarily will be needed for descriptive purposes, 
can readily be obtained in terms of these six тву substituting in the appropri- 
ate formula for th 


€ correlation of sums (or averages): 
(5) Tos = La ЁТ» + ry, + Toy 

ws = We nmi T... 

V2 + 2r. V2 + 2 

If the parallel halves of test | (also test 2) 

Scores, i.e., equal sigmas, (5) will give a value exactly equivalent to the 

correlation between the raw Scores of the two tests, 


When (3) and (4) are divided by their d ’s, the tw. 
estimates lead to 


‘action. 


yield strictly comparable raw 


9 resulting variance 


j i 2+ tw try — р 
(6) cres 


mo e 
o — Tu — Tus d ru, F Tu =r 
with n, =m = N — 1 as the dj's, 

Since no assumptions were m 


to be met are those which under 


ey 


E 
ту 


ade in arriving at, (6), the only assumptions 
1 1 lie the analysis of Variance technique. For 
the given situation the assumptions are that the errors of measurement 
entering into the X itm SCOTOS are independently : 


an istributed 

with equal variance, Inequality of variances will не, n е 
differ as to reliability, hence a valid interpretation of F, holds only when the 
reliabilities are the sam hese same assumptions hold for 
F, given below. 
A little consideration of the I x ar term in Table 2 leads to the a priori 
assumption that it will be zero within chance limits, hence its sum of squares 
may be combined with that for the 7 XTX л term, Note that this: ooled 
sum of squares is nothing more than the residual after taking out the rne 


QUINN MCNEMAR 263 


due to individuals and that due to J X T interaction. This residual sum of 
squares may be written as 


x x x (Xii ry Xi)’. 


When ¢ is given the explicit values, 1 and 2, and m the values, a and b, 
this sum of squares may be expressed, in simplified notation, as 


Sau w+ DO — w7? + 2D) (a =2 + У,(у— 2°. 

Replacing w by (и + v)/2 and z by (x + y)/2, this can be simplified to 

à» u-—-w-iiXGe-w. 
which, since the data are in standard scores, becomes 

Ма — tw) + NG т), 
or 
М2 — Tis — т.) 
as the residual sum of squares. The df for this residual is 2V — 2, hence 
_ 2 ob rus F ru — Pus — Tay — Pos — Toy 
2 — Tur — Tey 


with = N — 1 and n, = 2N — 2 as the dj's. 

It is of interest to see what happens to F, (and indirectly to F,) under 
certain conditions. When measures u and v are strictly parallel and measures 
x and y are also strictly parallel, it will be seen that the last four 7’s in the 
denominator of F, will be identical, thus drop out, and leave a denominator 
equal to that of F, . By utilizing (5), which holds exactly when the measures 
are strictly parallel, the last four r’s in the numerators of (6) and (7) can be 


replaced by 
tas VO + 2.) + 2"): 
(2 + 2r.)(2 + 27), 


(8) р = 2+ 


2 Т. 1178 


(7) Р, 


D 


This leads to 


When the two tests are equally reliable, and тш, = Tay = 7 


Р, 2 – 27 
ог 
(9) а = 


264 PSYCHOMETRIKA 


The т in (9) is the reliability for scores based on half а test. By the Brown- 
езйне, formula the reliability, r’, of total scores would be 


"= ®/@ +1), 


from which 


3 
ll 


r'/(2 — т), 
Substituting in (9) 


P, = П = 1 – r) 
BENZ rr + 
which simplifies to 


1 = te 
(10) Е, = pese 


Now, finally, if the w and z sets of scores are thought of as having been 
transformed to standa 


rd score form, the last expression becomes 


(11) Е, = 1 Z Tus 


in which c? is the error of measurement variance com 
When both the ш and the z scores have b 


s Will follow the F 
and by inference F, and 


an observed correlation deviates 


even by (6) or the Р, or (7), or their equivalents 
» Would be used to determine 
5 was farther below unity than 
a larger df for the 


uation (2), js equivalent to the a 
priori assumption of no 7 хм interaction, which, he ae s uires that 
certain population correlations, instead of coy. be в i The P 
method assumes that the two tests are equally reliable , пе 


e (in the population) 


| 


QUINN MCNEMAR 265 


whereas the likelihood ratio test is not thus restricted. Lord's is a large 
sample significance test whereas the F test may be used for either large or 
small samples. E 

Until such time as the mathematical interrelation, if any, of the two 
solutions has been ascertained, it is of interest to see whether or not the F 
test yields levels of significance similar to those of the likelihood ratio test 
when applied to Lord's illustrative examples. For this purpose the probabilities 
for his normal deviates have been determined to two figures beyond the cipher, 
and a curve relating p to F has been used for graphical interpolation in order 
to specify the p's associated with the observed 7"s as computed by formula 
(7) above. For the seven examples in Lord's Table 1, each of which involves 
equal observed reliabilities, we have the following levels of significance 
(Lord's p's given first in each pair): .024, .023; .0028, .0025; .17, .17; .062, 
.061; .024, .023; .0062, .0080; .0029, .0025. Admittedly, the graphic determi- 
nations of the fourth decimal figures are rough approximations. 

Lord also applied his method to data on 649 cases for which the observed 
reliabilities were different, .669 and .757. Since the F for these data goes far 
beyond available tabled values, it was transformed to a normal deviate by 
an approximation method given in Wallis and Roberts ([5], p. 458). This 
yielded a normal deviate of 6.19, compared to the 5.94 obtained by the 
likelihood ratio test. For these same data treated as though based on N = 101, 
Lord finds significance at the .010 level whereas the F test leads to .007 by 
both graphical interpolation and the Wallis-Roberts transformation. 

Such close agreement in results as shown in these nine examples may 
not oceur when the equal reliability assumption does not hold, but since the 
study by Norton, as reported in Lindquist [2], shows that marked hetero- 
geneity of variances (25, 100, and 225) has little effect on F tests (i.e., а p 
of .01 is near the .02 level and a p of .05 is near the .07 level), it can be pre- 
sumed that such differences in reliabilities as are apt to be encountered in 
practice will not seriously disrupt the / test proposed above. Кезо of 
.80 and .95, .60 and .90, .40 and .85, and even .20 and .80, each set of which 
leads to error variances differing by a factor of four, would not be nearly as 
extreme as the differences in the Norton study. 


REFERENCES 
thod. New York: Macmillan, 1924. 8 
d analysis of experiments. Boston: Houghton Mifflin, 1953. 
est for the hypothesis that two variables measure the same 
ent. Psychometrika, 1957, 22, 207-220. 


[1] Kelley, T. L. Statistical me 

[2] Lindquist, E. F. Design an 

[3] Lord, F. M. A significance t 
trait except for errors of measurem geometries 1 

[4] Мемел, Q. Psychological statistics. New York: Wiley, ie -— Ре 

[5] К n wW A. and Roberts, H. V. Statistics. Glencoe, Ill.: The E 

9. allis, W. A. is 


Manuseript received 8/20/52 


Revised manuscript received 11/9/57 


te 


- 
S 


uM 


PSYCHOMETRIKA—YOL, 23, No. 3 
SEPTEMBER, 1958 


| ТНЕ KUDER-RICHARDSON FORMULA (21) 
AS A SPLIT-HALF COEFFICIENT, AND SOME REMARKS 
ON ITS BASIC ASSUMPTION А 


Samven В. Lyerry — į 
WASHINGTON, D. C. „е 
Case IV of the Kuder-Richardson series, their formula (21), is derived 
аз a generalized split-half Spearman-Brown coefficient, The basic assump- 
. боп employed is shown to be sufficient to justify the various assumptions 
used in derivations by other authors. Some of the implications of this assump- 
tion are discussed. 
, É : 
Cronbach [1] showed that the Case III reliability coeffieient of Kuder 
and Richardson [2] is the mean of all possible split-half Spearman-Brown 
coefficients. This demonstration was useful to students of test theory in 
bringing the Kuder-Richardson formulation closer into line with the familiar 
split-half concept. It is the purpose of this paper to present а development 
of the Kuder-Richardson formula (21) from the split-half point of view 


. and to comment on the basic assumption underlying this useful coefficient. 


A test of n items (n assumed to be an even number for convenience) 
may be split into halves in n!/[2(n/2) 8] possible ways; with random splitting 
“each possibility may be regarded as equally likely. Suppose all of these 
Splits are made for a given individual and all of the possible pairs of half 
tests for him are scored. From the well-known theorem concerning samples 
drawn with replacement from a finite population, the variance of either half- 
Score distribution is : 
Xia — Хх) 
(1) var = Lo — 20. 


Since the correlation between all pairs of half scores (or any sample of them) 


18 — 1, their covariance 18 
(2 X= 9, 
ew = -in -1) С 
Since the pair of scores resulting from one split is as likely as the pai 


resulting from an ther split, there is no reason for preferring je age 
rather than Baoi E ; aet dar the entire set and record lr A 2 
lvariato table м «7% а= all on a scatter diagram. Us " 2" pre uir 
cedure for each of бага ^ otN individuals and compute the р 

Correlation coefficient ч all N sets of pairs distributions 
€ а number of equally numerous equal to the av 
* of the resulting total distribution 15 i 

267 


are merged, the 
erage of the. 


968 PSYCHOMETRIKA 


variances of the separate distributions plus the 


varianee of their means. 
Using this identity and (1), 


1 [| 5m m Gr] 
(3) Varr = N 4(n — 1) 4 4N 


(All summations in this and following equations ver t j 

Similarly, when equally numerous bivariate distributions amy coniihined, 
the total covariance is equal to the average of the separate covariances plus 
the covariance of the means. In our case each separate bivariate peer 
(pairs of half scores for an individual subject) is symmetric about a mean o 
(X;/2, X,/2) and the correlation over all N Sets of means must be + 1. 
Their covariance is therefore Уху -= (XX) AAN, and thus 


are over the N subjects.) 


Or E - ж]‏ کا 
соут = N [== aa SS EF T1 -Dn a‏ )4( 


Since (3) is the variance of e 
ч 


ither set of half 
coefficient is simply (4) divided by 


Scores, the correlation 
(3). Performing 


this division gives 
hm (уу Tx Х)/@ — 1) 
= 2 d 
Ух - (у; XY/N + Y Xi — Х)/® — 1) 
_ Ns: + (Dx? n 2523/0 — 1) 
LIN ~ =) 
Ne, = (>) xt 2520/0 = D 
Where s is the variance of the total test scores over the N subjects, 
Applying the Spearman-Brown formula, ; = 2r /(14- 7), and simplifying, 
re kt (Ean Ty X)/N(a — 1) 
ERA DIN — 1) 
S; 


г 


= ууз 
ns NES cr ue 


which is the Kuder-Rich. 


ardson form 
> X* by Ns? + ҮМ? 


ula (21), Replacin 
z, Where ЛГ, is t 


Ex by Nar, and 
he mean of the 


Scores, (5) becomes 
2 2 
fsi- nM, — з: — M? 


m= 1)s ; 
which simplifies to 


А №: AM. — Mim 
Tom—— — т, 
(6) Е DUE . 


Equation (6) is one of the familiar for 
mean and variance of the scores and 

A number of derivations of K- 
each using somewhat different, Sets 


mulas for computing K-R (21) using the 
the number of items in the test. 
R (21) have appeared in the liter: 
of stated 


ature, 
assumptions, A specific as 


sump- 


SAMUEL B. LYERLY 269 


tion used in this paper is implied in the procedure described above in which 
all the possible pairs of half scores for all individuals in the sample were used 
to compute the reliability coefficient. It is assumed not only that a pair of 
scores resulting from one split is as likely to occur as a pair resulting from 
another split for a given subject, but that each such pair can occur Zn combi- 
nation with any possible pair for any other subject in the sample. This suggests 
one way of stating the difference between the Kuder-Richardson formulas 
(20) and (21). From the split-half point of view, K-R (20) is the expected 
value of the coefficient when the same split is made for every subject. Making 
a different random split for each subject yields a coefficient which converges 
toward K-R (21). 

The above discussion has been couched in split-half terminology. Other 
writers on K-R (21) have stated their assumptions in terms of their own 
approaches to the reliability concept. One basic assumption is required; 
it can be shown that the assumptions in the various derivations, including 
Kuder and Richardson’s original presentation, can be deduced from it. 

The basic assumption for K-R (21) ean be stated thus: In a test of n 
items which are scored 0 or 1 for failure or success, respectively, each atlempt 
at an item by a subject is an independent trial in the Bernoullé sense. A Bernoulli 
trial is an event which has two possible outcomes, which may be called 
success or failure. If a number of independent trials are performed, they 
are indistinguishable so far as probability of success is concerned. Tossing 
coins, throwing dice, and drawing balls from urns are classic examples. 

For mental tests, this assumption implies that individual items have no 
separate identities, even though each may be worded differently or may ask 
a question different from that asked by any other item. Furthermore, item 
equivalence from person to person does not exist. The same item, worded 
identically, may appear on everyone's test booklet, but item 1 for Subject 
A is regarded as being no more like item 1 for Subject B than it is like item 
2 for Subject B. The two sets of items are, in effect, different samples of 
items from a common pool or population. Since items do not have individual 
identities, any method of splitting the test into halves is necessarily random 
from subject to subjeet, and the assumption employed in the above split- 
half derivation follows, For the same reason, any item statistic (difficulty, 
variance, item-test correlation, etc.) can be computed only by selecting 
responses from subject to subject at random. These considerations dictate 
the “equal difficulty, equal correlations, matrix of unit rank" assumptions 
of Kuder and Richardson. Lord's derivation [3], from the point of view of 
parallel tests constructed by random sampling from an item pool, implicitly 
uses this assumption. The K-R (20) formula, on the other hand, “recognizes” 
items and makes use use of item-subject interaction, which is assumed not 
to exist in the IX-R (21) formulation. 


210 PSYCHOMETRIKA 


REFERENCES 


[1] Cronbach, L. J. Coefficient alpha and the internal 


1951, 16, 297-334. 


[2] Kuder, С. F. and Richardson, М. W. The theo; 


Psychometrika, 1937, 2, 151-160. 

[3] Lord, F. M. Sampling fluctuations result 
metrika, 1955, 20, 1-22, 

Manuscript received 7/30/57 

Revised manuscript received 1/14/58 


structure of tests. Psychometrika, 
ry of the estimation of test reliability. 


ing from the sampling of test items, Psycho- 


Th. 


4 


se - 


PSYCHOMETRIKA—VOL, 23, NO. 3 
SEPTEMBER, 1958 


THE AVERAGE SPEARMAN RANK CRITERION CORRELATION 
WHEN TIES ARE PRESENT 


Epwanp E. CunETON 
UNIVERSITY OF TENNESSEE 
This note presents the average Spearman rank correlation between m 
independent rankings and an untied criterion ranking, corrected for ties in 
any or all of the independent rankings. 
Lyerly [2] has considered the average Spearman rank correlation between 
m experimentally independent rankings of n persons or objects and a criterion 
ranking, with no ties in any of the rankings, and has shown that its distri- 
bution approaches normality rapidly as m and n increase. 
Let y represent a criterion rank, x a rank in any one of the independent 
rankings, and 


(1) X= De. 

Then X is the sum of the independent rankings of one person or object, i.e., 
the sum of the independent rankings for fixed y. With this notation, Lyerly’s 
formula may be written, 


12 Ху. 


m(n? — m) 


3n +2 
S == М 


(2) Par = 1 + 


The total sum of squares of differences between a y and an z, as given by 
Lyerly, is 


m n 


(3) X Da = jc + DOR D о у xy. 

Dividing by m to obtain the average and then substituting this average 
for >> @ in the usual formula for the Spearman rank correlation 
[o = 1 — 6 >; @/(п° — п)], the result reduces to (2). 

Now assume that ties may be present in any or all of the independent 
rankings. When any one such independent ranking is correlated with the 
criterion ranking, Kendall ([1], p. 31) shows that У) d" must be increased 
by >> (fF — 0/12, where t is the number of persons or objects having equal 
ranks in each tied set, and the summation is over all sets of ties in the in- 
dependent ranking. Let 
(4) T= EC d), 
the summation now being over all sets of tied ranks in all m independent 
rankings. Then (3) must be increased by adding 7/12 to the terms on the 
right. Dividing by m to obtain the average value of >> d° corrected for ties, 

271 


272 PSYCHOMETRIKA 


and substituting the resulting expression for 2 


@ in the formula for the 
Spearman rank correlation, yields 


=1- +2, AD Xy-T 
(5) Par = n—1 2m(v — n) ° 


the average Spearman rank criterion correlation corrected for ties, 

Note that pa. as defined by (5) does not have lim 
present; the ties in any one independent ranking prevent complete agreement 
with the criterion ranking even if the order of its ranks is otherwise identical 
with that of the criterion series. This formula is in fact a generalization of 
Kendall’s p, ([1], p. 29). The criterion ranking, moreover, may not contain 
ties; the problem of the correlation between an independent ranking and a 
criterion ranking which includes true ties has never to the writer's knowledge 
been investigated. 

Kendall shows ([1], p. 64, footnote) that the st 
null case is the same when ties are present 
critical ratio, 


(6) СЕ = р. Vm — 1); 

may Бе taken as a unit-normal deviate in testing the hypothesis that the 
true average correlation is 0, whether or not ties are present, There is no 
corresponding theory to cover the non-null case, 

The approach of the distribution of Pa, to normality 
and m is fairly rapid in the untied case, and should not be 
when ties are present except in extreme cases, 
dation would be to use (6) whenever: 


its + 1 when ties are 


andard error of p in the 
as when they are not. Hence the 


with increasing n 

appreciably slower 

A Conservative recommen- 

(а) n > 5andm +n > 20; 

(b) not more than 2/3 of 
tied; 

(c) every independent ranking h 


all the ranks in the independent, rankings are 


аз at least three distinet values, 
When these conditions are met, 


is fortunate, as the proper corre 
the above conditions are not 

applicability of the ¢ approxim 
direct computation of the ex 
consuming even for small m 


no correction for c. 
ction would be 
met, there is no 
ation to Pas 
act probabili 
апа n when tie: 


ontinuity is heeded; this 


difficult: to determine, When 
good significance test. The 
has not been demonstrated, and 
ty would be Prohibitively time 
5 are present, 


REFERENCES 
[1] Kendall, M. G. Rank correlation methods. London: Griffin, 1948 
Lyerly, S. B. The average Spearman rank lati ka 
pi i No Correlation coefficient, Psychometrika, 1952, 


Manuscript received 11/22/57 
Revised manuscript received 1/24/58 


жй 


PSYCHOMETRIKA—VOL, 23, NO. 3 
SEPTEMBER, 1958 


NOTE ON “EFFICIENT ESTIMATION AND 
LOCAL IDENTIFICATION IN LATENT CLASS ANALYSIS"* 


ВіснАвр B. МсНосн 
UNIVERSITY ОЕ MINNESOTA 


I am indebted to Dr. Albert Madansky for detecting the following two 
errata and for suggesting the corrections outlined below. 

(1) The information functions were incorrectly given ([1], p. 346). They 
may be corrected as follows: 

Write “numerator of qoc, iip AS Qon: and “numerator of 
Quit) 88 Âi iaip) - Then 


Þ3 [dc ciui — фус зое Осеева = Gear cantare] 


iis 


1 
— ate 
п ое? 


for e, d = 1, 2, ,Y— Ц; 


id Giai VG ie iriri Gairen 


1 
= Iuin = > ^ 
qu EHE ga £3 95.5 (Qua — 1d Sia gun — 1+ б) 


fori, 4 = 1,2, =- , 5,60 = 1,2, =+ , y; and 


1 [à булө» л — 3 Ж istnd dao зер) Jie) 
m M = Footy Cintarssip) — Фен: 4» <_< 
"E 27 (безе — L E Gia) 


fore! = 1,2, ++. ,y—1,¢= 1,2, 0°", 7, and1 = 1, 2, »-- ; р, where 


deD gi 


«Des 1 if question 7 is answered yes in the dth member of D, 
"b 


0 otherwise. н 
The errors arise in differentiating log L with respect to f, and gi; . 


"When differentiating log L with respect to fe , c = 1, +t, Y — 1, one must 
differentiate f, = 1 — f, — fa — Б with respect to f. as well; when 


differentiating log L with respect to g;;. , one must evaluate 


9 [f. II ga: ДЇ. (4 = gure] 


iePa 


Juo 


for each d, where P, is the set of questions answered "positively" for the dth 
member of D. After taking these points into account in determining 8. and 
Si; , the information functions as stated above are easily obtained. 


*Psychometrika, 1956, 21, 331-347. 
273 


274 PSYCHOMETRIKA 


s (2) The statement was also made ([1], p. 337) that if an estimator of 
the structural parameter 0 is consistent, then 0 must be loeally identifiable. 
Actually, if the estimator is consistent an even stronger statement c 

made, namely that 0 is identifiable (see [2], p. 376). However, since the 
proposed estimator of the latent parameters is consistent only when certain 


(local) restrietions and regularity conditions on the parameters are met 
the fact that the estimator is consistent only implies that the parameters 
are locally identifiable and not identifiable, 


an be 


REFERENCES 
[1] McHugh, R. B. Efficient estimation and local identification in latent class analy 
Psychometrika, 1956, 21, 331-347. 
[2] Reiersel, O. Identifiability of a linear relation between variables which are subjeet to 
error. Econometrica, 1950, 18, 375-389. ` 


sis. 


Manuscript received 5 /16/5, 


AO 


ч 


= 


“= 


BOOK REVIEWS 


Henry QUASTLER (Editor) Information Theory in Psychology: Problems and Methods, 
Glencoe, Illinois: The Free Press, 1955. Pp. x + 486. 


Several patterns for presenting the proceedings of specialized scientific conferences 
have developed. Differences in patterns can be exemplified in reports on the one topic 
with which the book under review is concerned—information theory. One extreme is 
represented by the book published in 1953 by the Josiah Macy, Jr. Foundation, Cybernetics, 
Circular Causal and Feedback M echnisms in Biological and Social Systems, which seemed to 
have been a searcely retouched publication of the working papers as they were actually 
presented in the conference, together with a practically verbatim transcript of all the 
discussion. A report of another conference on information theory held in 1954 was multi- 
lithed and published more or less informally by J. C. R. Licklider of M.I.T.; this consisted 
of a somewhat filtered and digested version of what went on, Licklider using himself as a 
transducer. A more standard pattern was represented by the University of Pittsburgh’s 
Current Trends in Information Theory (1953), presenting simply a series of scientific papers, 
amply edited and proofread by their authors, who were invited to read them at this 
conference. 

The present volume invites favorable comparison with all of the afore-mentioned. 
Nearly all the working papers presented at the conference are printed as edited by their 
authors after the conference, and interwoven is a sort of running comment by the editor, 
giving his own reactions as well as relaying the significant points which came up in the 
discussion at the conference. Furthermore, the papers are arranged not in the order in 
which they were presented at the conference but in an order which the editor found after 
the conference to be more logical and advantageous. 

The editor is able to make the reader feel that he is not missing any of the really 
significant discussion, but he cautions that a number of “really good papers” were not 
shed because their authors could not find time to prepare them for publication. 

So much for the format of this varityped, offset-printed volume. What of its con- 
tent? Even though it can hardly be said to have the appearance or organization of a text- 
book, the book could well become one of the standard references on information theory 
chology, for it represents the first serious attempt to assay the positive gains in this 


e notation and terminology, instead of merely presenting a somewhat 
of the other . 


publi 


in psy 
field and to standardiz 
disorganized array of interesting suggestions and speculations, as do some 


works on information theory. 
In this respect, a most important feature is the generalization of information theory 


ond the bounds of its most frequent application in communication. A few quotations 
from the summary composed by the conference members will be of interest. “It is basic to 
information theory that any event is evaluated against the background of the whole class 
of events that could have happened. Information theory proposes to measure the effect 
of operations by which a particular selection is made out ofa range of possibilities. „ейи 
information theory, variation is the indispensable basis of selection, discrimination, com- 
munication, specification, and related operations. Traditional statistics is largely concerned 
with what can be done or said in spite of variation; information theory deals with what 
can be done because of variation. ... If two objects or events are related, then they must 
mutually affect each other’s range of possibilities. Thus, relatedness amounts to a mutually 
selective operation. ... In this way, information theory describes the strength of coupling 
between components of a system, parts of a structure, members of an organization, and 

rtions of an ordered sequence in time. ... The relation most commonly studied is com- 


munication.” 


bey 


275 


216 PSYCHOMETRIKA 


Seen in this light, information theory may provide a way of measuring more ob- 
jectively the dynamic relations existing among the parts of Gestalten, In fact there were 
many hints that the conference members were beginning to be concerned with the study 
of complex dynamic systems, particularly those represented by the human being as a 
transducer of input. 

Perhaps out of modesty, one of the most valuable papers “Standardized Nomen- 
clature, An Attempt," was placed at the end of the section on "foundations," by the 
editor, who was also its author. Users of information theory are put on notice, regardless 
of their personal predilections, that the basie system of notation in information theory 
will involve measures of H (uncertainly or specificity), T (relatedness or communication), 
A (partialled relatedness), D (constraint), and C (redundancy), together with associated 
subseripts, parenthetical terms, and the like. 

Along with notation, one would demand computational 
all here, even tables of log; n and —p logs p provided by Kl 
Technical Report. McGill gives a concise and elegant statement of his well-known work 
on the relation between multivariate analysis and information theory. There are reports 
by Miller, Rogers, Green, and Augenstine of noble attempts to determine the properties 
of sampling distributions of information functions; while the sampling dicibus x: E 
not yet been determined for the general case, Miller's finding that the basic fale vien 
funetion, H, is a biased statistic should receive wide attention, Attneave shows oe ver, 
that under certain conditions a better estimate of H than that achieved by tl "Nitec. 
Madow correction can be made from averaged values in a transmission m nee m 3 
section on information measures as such comes to a close with a chapter rss Не a e К s 

on approximate procedures for estimating information measures, ен 

By far the largest portion of the book is devoted to reports of Psychologi регі 
ments, or to theoretical. papers concerning specia] applications of infor ot logical experi- 
line with the earlier insistence that “communication” jg only one are mation theory. In 
theory can be applied, there is rea where information 
any kind of symbolic, 9 language or to 
comes to finding anything of this nature, in fı в А пл. апа 
on the dimensionality of facial expression as a vehic ioni ea reports work 
a paper by Fritz and Grier on air traffic control com anila ting ауана чв 
of the papers are concerned with * "m 1 the Contrary, most 
human organisms and machines, We find, first, that the activity сева that, between 
seems to be “unitized,” timewise, е © the organism itself 
experience existed in a series of fr. iiis ient оз as if human 

The very interesting question ich. before ilm, 
with each moment or frame and th г pursued л lon ean be contained 
the concept of channel capacity. Soon we are looking at >y Hake in ц note on 
with a machine—a machine (or rather, an Apparatus) whi Starting to interact 
or buzzers, something which is in effect a Stochastic process: {} S, by means of lights 
to judge the relative frequency of events or 152 11е task of 

take place next. Two types of theory, the 


Procedures, and they are 
emmer from an Air Force 


t i Е the organism is 
9 prediet at qe of time what ae will 
€ othe haces 
been offered to account for the organism’s disturbingly ane *Ssociationistic, have 
situation. Further papers, by Senders and isinger and Xie behavior in this 
the application of information theory to s i avior Bt ts, and others, discuss 
responding to various other types of infor Fi баі S reading dials or 
is that by Bricker, who after a review tona arly interesting paper 
amount of information processed is linear] ime re ents Suggests that the 
Allin all, this is an excellent series of ex Б antitative а OF Processing. 

та lonalizing of psy- 

Harvard University TONN В, Carrow 


BOOK REVIEWS 277 


WILLIAM G. COCHRAN AND GERTRUDE M. Cox. Experimental Designs. New York: John 
Wiley & Sons, Inc. Second Edition 1957. Pp. xiv + 611. 


This excellent revision of a well-known and successful textbook is better described 
as a judicial and timely extension than as a reformulation. Experimental Designs, first 
published in 1950, provided an orderly and much needed practical description of experi- 
mental designs important for biological research, particularly agricultural experimentation. 
Cochran and Cox had sought to prepare a handbook of precedures, and from the arith- 
metical and practical standpoint their Experimental Designs was admirably complete. It 
was not a beginner’s cook book, however, and a working knowledge of the statistics in- 
volved in experimental analysis, particularly analysis of variance, was assumed. 

The present (second) edition, although written in the framework of the original 
edition, incorporates numerous substantial improvements and extensions. The authors’ 
presentation of the various currently useful experimental plans involves both the intro- 
duction of new approaches and the addition of recent developments in well-established 
procedures. In this way they have sought to accommodate their second edition to the 
continuously emerging requirements for experimental designs. Because of changes in the 
present volume und because of the growth of experimentation in industry, this second 
edition will find many more applications in industry than the original work. Nevertheless, 
references to specific industrial problems or processes are relatively few. For the most 
part the references are from the fields of mathematical statistics, biometry, and agri- 
culture. 

This is not a volume in which the typical reader from the behavioral sciences will 
feel at home. Almost all the problems and many of the plans are alien to the experience 
of the behavior scientist, and for this reason he may not recognize plans which are appli- 
cable to his research interest. He will be tempted, moreover, to adapt or modify his re- 
search interests to take advantage of some of the elegance and economy represented by 
many of the plans included in this volume. This is not unusual. Many of the more elegant 
designs employed by psychologists are borrowed from other fields, and certainly designs 
developed in other research areas can not be fairly gauged on the basis of their obvious 
and immediate relevance to the needs of those who investigate behavior. 

Although the present edition is sufficiently concerned with specific procedures and 
explicit cautions to justify its being described as an extremely practical guide to experi- 
mental plans, it is not written at the introductory level. Like its predecessor, this book 
presupposes both sound statistical training and substantial experience in the experimental 
study of various problems. From the standpoint of those who have not fully mastered 
basic statistical analysis, Chapter 3 is particularly and significantly instructive. It provides 
a useful summary of analysis of variance and applications of some of the assumptions 
involved in least squares procedures. 

In addition to the revisions and extensions of the various chapters from the original 
edition, the present volume contains two new chapters. Chapter 6A deals with the use 
of fractional replication in factorial experiments. Chapter SA includes a section on the 
use of discontinuous data, characteristic of many formal analytical plans, with the inter- 
pretation of various functions as continuous trends. The readers will find many new designs 
in this edition including new incomplete block designs and new incomplete factorial designs. 
The scope of the volume is best indicated by the chapter headings: 1, Introduction; 2, 
Methods for Increasing The Accuracy of Experiments; 3, Notes on the Statistical Analysis 
of the Results; 4, Completely Randomized, Randomized Block, and Latin Square Designs; 
5, Factorial Experiments; 6, Confounding; 6A, Factorial Experiments in Fractional Repli- 
cation; 7, Factorial Experiments with Main Effects Confounded: Split-plot Designs; 8, 
Factorial Experiments Confounded in Quasi-Latin Squares; 8A, Some. Methods for the 
Study of Response Surfaces; 9, Incomplete Block Designs; 10, Lattice Designs; 11, Balanced 
and Partially Balaneed Incomplete Block Designs; 12, Lattice Squares; 13, Incomplete 


278 PSYCHOMETRIKA 


Latin Squares; 14, Analysis of the Results of a Series of Experiments; 15, Random Permu- 
tations of 9 and 16 Numbers, 5 

It is possible that some of the designs which are appropriate to sequential ехрегї- 
mentation in industry may find application in the studies of the behavior scientist. Cer- 
tainly the economies promised by the various kinds of confounding plans will continue 
to challenge the psychological investigator. 

"This book with its handsome format, most readable type, 
entation of analyses and plans will continue to be 
fact that the new edition comprises additions, 
changing the structure or the organization of t| 
convenient for the teacher to introduce the new 
The authors are to be commended for underta 
entation of an intricate b 


ody of knowledge. Thi 
who perform experiments and to those who res; 


and well-designed. pres- 
greatly appreciated as a textbook. The 
refinements, and improvements without 
he original volume should make it very 
text into courses which used the old text. 
king this revision and for their lucid pres- 
is book can be of great value both to those 
ponsibly read experimental reporta. 


J. R. WITTENBORN 
Rutgers University 


PSYCHOMETRIKA—VOL. 23, No. 4 
DECEMBER, 1958 


THE MYSTERY OF THE MISSING CORPUS* 


FnEDERICK MOSTELLER 
HARVARD UNIVERSITY 


Until recently, two happy vices stole a good share of my time. Of those 
burglars, the first was the reading of science-fiction stories, a source of enter- 
tainment that I discovered at the age of eight, but kept manfully in check 
because I never stumbled upon a continuous supply. Of late, however, real 
life has mocked this branch of fiction so closely that I now miss in it the 
elements of fantasy and escape that used to delight. You might say that, 
being crowded out by the real thing, we armchair spacemen face technolog- 
ical obsolescence. 

My second time-stealer has been “Murder for Pleasure,” as Howard 
Hayeraft calls the detective story. Fortunately, I did not discover its exist- 
ence until my twenties, so I had a little time for study before then. The 
great value of the detective story, quite apart from its educational aspects— 
after all, where else can one gather so much data about useful everyday 
questions such as tasteless poisons, homicidal law, and criminal psychol- 
ogy—is the fine training one gets in applied psychology and in the logic of 
everyday life. 

Sigmund Freud and Carl Gustav Jung can look to their laurels be- 
cause, after even a short course, we mystery-story fans can twist the slightest 
tongue-slip into a full confession, and impractical logic-choppers like Alfred 
North Whitehead and Bertrand Russell are pitifully pedestrian in their 
thinking, compared to the agile mental leaps we make—alongside Sherlock 
Holmes. We have to jump because Sherlock has a great reluctance to share 
his evidence with the reader. In fact, to put it bluntly, Sherlock cheated, 
but we excuse him because he performed before the rules were written. Now 
that we have rules of evidence for modern detective stories, all readers find 
it easy to search out the killer. (Anyway all readers I’ve ever met.) Indeed 
Robert Louis Stevenson commented upon this remarkable ability, “It is the 
difficulty of the police romance that the reader is always a person of such 
vastly greater ingenuity than the writer." And I personally have a sneaking 
suspicion that this is also one of the difficulties of Presidential Addresses. 
In best American style I wish to capitalize on my own deficiencies and take 
advantage of your ingenuity by pointing to а number of puzzles that seem 
to me to need more detective work in this Society—some minor like parking 

*Presidential address delivered to the Psychometric Society, Washington, D. C., 
September 2, 1958. 

279 


280 PSYCHOMETRIKA 


violations and normality, others major like murder and 
ment. 

Let us begin with a minor problem en 
guage by our great detective Thurstone: tl 
tive Judgment. When I talk with others 
comparisons or one of the related scal 
What about the need for 


scales of measure- 


ing methods, several questions arise: 


an underlying psychological 
continuum? Students ask Why I think that such a complicated theory can 


possibly apply to attitudes, opinions, and psychophysical judgments. 


To take up the last matter first, there is more than one attitude that 


- One attitude is that implied by 
tric method used is a strong 
sly as such by its users, But an al- 
ave a statistical technique that makes it pos- 
ala. Alter all, what we like about paired com- 
і : echniques, the judgments required seem 

rather simple and basic. What we don't like is the substantial body of data 
available at the end, Without further interpretation, if there are say 7 stimuli, 
we have (7 X 6)/ 2 = 5 to explain or to remember or 21n judg- 
ments, where n is the number of subjects. This is a great deal of information 
arize it. If we can replace 
' each stimulus, and if, further, we provide 
Без can be accurately recaptured from 


lost any information. 
ethod of paired comparisons HB 

1 i мар О; D parisons : 
= requi Dive statisties, а method like the fitting of a 
Á н а uires relativ, itt] P " "it 
justification. One is not forced to p re i чы a es 
а owar y= 
5, and its exist cnown. 
Let us turn to the Case of the Eerie тушн on the Pha, авон 
or Does Normality Really Matter? ни 
In the original 
5 aw of comparative judgment the 
st as à routine matter 
such responses are 


relationship to the detective 

The family of distributi 
corresponding to judgments 
but that corresponding to the ; 


Story here, See why there is a 


ed in the analysis is not the family 
"response values" 
erence of two such re- 
difference of two 


FREDERICK MOSTELLER 281 


independent random variables, identically distributed, except possibly for 
slippage, then we could guarantee а symmetrical distribution; and insofar 
as the original measurements might have been approximately normal, the 
all-important distribution of differences would be more nearly normal (in 
the sense of eumulants). Such reasoning might be used to justify the nor- 
mality assumption as a close approximation, but it is tenuous reasoning at 
best. 

Why is it tenuous? First, the independence assumption is not very 
appropriate, for reasons I will not give in detail. But, briefly, since the two 
stimuli are presented at about the same time, we expect the subject to be 
much more like himself in that little interval than like someone else at 
another time, or even than like himself at a distant time, and thus we do expect 
correlations. Second, we are not even dealing with an actual difference of 
random variables, but with single judgments about a difference. And there 
is nothing in the theory that pushes this distribution toward normality. We 
cannot therefore depend upon the previous train of reasoning to establish 
solidly the shape of the response curves or even to say much about the family 
of curves we use. On the other hand, maybe we can say that some family 
works well. 

Some kind of curve or family of curves is needed to grade the response 
percentages. Just how important the shape of this curve is has never been 
made clear. To put it another way, except for some work of Hull and his 
associates, there does not seem to be much research either empirical or theo- 
retical studying the impact of various shapes upon goodness of fit. The be- 
havior of the tails of the distribution could be important—though we don’t 
usually give much weight to the very extreme proportions like .999 or .001. 
Emphasis on the shape of tails is due to a rather practical remark of the 
late Charles Winsor; he pointed out that most distributions met in practice 
are approximately “normal in the middle." 

Perhaps a word more about the nature of the curves under discussion 
would not be amiss. We usually talk about the normal as if we were con- 
cerned with a frequency distribution. This notion arises from the model 
used to derive the curve. An alternative view is that we are dealing with 
the equivalent of a dosage response curve or an operating characteristic, and 
that as we move from left to right (increase the dose), the curve shows the 
percent that die from the administration of the poison. To push the matter 
further, some poisons have response curves that increase as we move from 
left to right, reach a plateau, and then decrease—a thing a cumulative dis- 
tribution does not do. I bring this out to emphasize that the interpretation 
of а distribution is not absolutely essential rather than to suggest the use of 
pathological functions for grading responses when we are doing well without 


them. А 
Without trying to carry out the research needed to discover the sensi- 


1 L88` 0987 SOL 
068* 18: 921° 
т 988° 668° 201° 
1: S88* 9sg- c69* 
I 288° 6+8" БЕРЫ 
ило seəd sueəq yoeutds 
Бит235 


889° s097 
90L* 629" 
S89" тоо` 
619° 65° 
699° BES? 


s3o0zze) snbezedsy 


90t* 
ЕР” 


[40 
O6E* 
TEES 


SjooH 


Lee 
TIE” 


ТЕЕ“ 
906" 
p8z? 


0 
0 
0 
0 
0 


afeqqed sdriuzni 


Ot, 


© 

Ix] т 

А ase’ 

d^ UTS олу 
WZO3 TUN 


—— ——————_— ——_— ———.—.———_————_————— 


E 
Ы 
2 
o 
m 
5 
ü 949* ost’ СЕЕ“ 
6TL? 606° [42 
79° 9sv* CUP 
vT8'tI L8'6 S8'8 
eee 8ST? OPT’ 
uaoo sePod sueaq 
Бит235 


гот" 
#6Т` 
9ST* 
927€ 
050° 


uoeurds 


ST 3'ISVL 

EEE? £400 = 
99т° OTO? 

6ёт` 800'- 

KE Een 
cro" 200 = 


<$золдлрРру Ssnbeaedsv 


6vt'- 
SSE = 


EEE 3 
8c*L- 
SIt'- 


5зәәя 


т uor3TSOd әчз uio) pue 0 чотзтѕоа 8eu3 sdruznj ƏATƏ оз рәзѕпГру YT eTqeL 103 son[eA eTeos 


Eps 050° T= ол, 
С 
Em "I- ol 
cos PGT T |xi-? т 
sor 886'- A 9se2 
02°0Тт- 08`02- dp urs олу 
Дет TER E wzojz TUN 
ebeqqeo sdtuany, 


ѕәотләа биттеоѕ jo Азәтлел e Hutsn әташехя o[qe3959A лој SenTeA eTeos 


282 


VT ЗТЯ 


283 


FREDERICK MOSTELLER 


*son[eA рәлләѕао шолу ѕчотаетләр oj4nrosqe jo UNS yy 


goc’  669' Of9°  vI9* CIS" (&*) рур" 6ТЕ" cez’ TET: OT, 
Lz? 9TL 69’  9t9* 618° (57) [4320 867" esz“ ОЕТ" AS = 
862" 969: 829’ TTS" TIS" (S^) our" ФЕ? 9L2* ФЕТ” А әѕео 
осе" 689° £29’ 909° 605° (s^) arr’ eze’ zez VET’ dj urs олм 
есе” 089’ 9T9’  865* 805° (S^) TS’ Ope’ tez" Leu uxogTun 

pol’ 60L' LS" £67" (S^) 6E’ voz" ise сет“ рәлләѕдо 


" илод Sead sueaq youutds 5302220 snBexedsy ѕаәәя obeqqeo Ssdruxni 


Бит235 


КИИИН MN СЕСЕ ЕСЕ 


SUOTILATASGO „102220, BUA 303 рәлләѕао pue вчотзлойола pazn3de2a¥ 3o човтлкашод 


€ пум 


"m ©6667 86666" 8666" 0666” a 
А t 
“it 0666" 2866 S966" at 
| т 
"n 6666* ©6667 A eseo 
т L666* dputs олу 
“т шлозтип 
ot, 9 t A əsrə dp UTS 22ү uZO3 TUN 


[= 
eee 


Spou3aW ѕпотлел Aq SuOTITSOd eTPOS usemqzeq 5ицотзртәллоОО 


с sISVL 


284 PSYCHOMETRIKA 


tivity of the method to the curve, I shall give a notion of what can happen: 
Table 1A shows the fitted values provided by a variety of different сигу es 
used to grade the response percentages in an example of vegetable proferenaes 
shown by Guilford [2]. The curves used were the cumulative distributions: 


ig he tails. Clearly the uniform dis- 
tribution has the lowest tails, are sin Vp a little higher, and so on. (The 
densities should be thought of as distributed over the whole real line, and so 
the tails of the uniform have zero height.) In Table 1B we have transformed 
linearly so that the scale values run from 0 for the lowest to 1 for the highest 
stimulus. These transformed values show that the scale values are very 
similar. 

Table 2 shows this similarity by 
lus positions assigned by the various 
at correlations between one set of numbers and a mild transformation of the 
Set, vou will regard these correlations as rather high. 


While the Spacings are roughly invariant except for linear transforma- 
tions, the reproduction of the original percentages is not the same; these 
reproductions are shown in Table 3 for one line of the original table. Note 
that the sum of the ¢ ases as We move down the rows and 


absolute errors decre 
then increases, Appropriate compari › ОЁ course, be made for the 
whole table. I migh i xhibited here agrees with 
my other experience: arc sine were not as good as the 
normal, and the normal is not quite £ with higher tails. 
pe to explore the sensitivity of 
аре of the eurve used to grade 


giving the correlations between stimu- 
curves. Unless you are used to looking 


th, then, would 1 


„М sed Judges, We need, too, a re- 
thinking of the theory of the pai arison approach апа its relatives, 
expecially in examples wh are recognizable by individuals and 
are ordered rather consis i 

gested by M. G. Kendall [3] 


viduals in a set of paired compari 


FREDERICK MOSTELLER 285 


as far as individuals are concerned. If so, we could, by examining the re- 
sponses of individuals in detail, capture the ranking proposed by each indi- 
vidual Then we could use as the scale value for a stimulus the average rank 
assigned. This sounds like a lot of work, but when there is no inconsistency 
within an individual it is possible to get the average ranks directly from the 
paired comparison table quite easily. We just sum the fractions in a column, 
including a 1 rather than a 0.500 in the main diagonal. This extra 1 adjusts 
so that we rank from 1 to № (the number of stimuli) rather than 0 to N-1. 
Now if we have slight inconsistencies this method will give an approximate 


ranking. 


tn 


TABLE 4 


Formation of Paired Comparisons Table from the Rankings ы 


Number Stimulus 
choosing 
Order Ranking order ^ B é 
l ABC n 
1 А in naan tn, n,*n,*n 
2 ACB n, 3 «X8 4 5 6 
3 BAC пу в n +ng+ng ni n tntg 
4 BCA ni 
+ + 5 
5 CAB S пана" ny has 4 
6 CBA a " к " 
+ + 
"e d ° Sum 3n,+3n,+2n, nj*n5*3n, ny njtn; 
an 
+nq+2ng*Pg +3nq +3+26 +2n4+305+306 


* 
Entries in paired comparison table should be divided by 
n= Xni to obtain proportions rather than frequencies. 


Table 4 shows the method for three stimuli. There n; is the number of 
individuals responding in a particular manner; A, B, and C are labels for the 
stimuli; the ranks are 3, 2, and 1. The paired comparison table shows in the 
cell the total number of people preferring the stimulus at the head of its 
column to that listed at the left of its row. It is readily verified that the sum 
of the ranks for the stimuli is identical to the sums at the foot of the paired 
comparison table. 


There is an essential identity between this technique and the use of the 


cumulative uniform distribution as the dosage response curve for paired com- 
parisons. In this interpretation, however, What is basic is the number of 
individuals in the sample or population who choose each of the N! arrange- 
ments, Whereas in the usual analysis of paired comparisons it is the N(N-1)/2 


preference percentages. In the paired comparisons approach we obtain a set 
these by a set of N numbers: “Fhen-we-use 


of N(N-1)/2 percentages, replace "Ednl. ^" Research 
"un iag GOL “GE | 


286 PSYCHOMETRIKA 


differences of these N numbers (in Case V ) together with a normal table or 
other rule to approximately recapture the percentages. In the technique е 
the average ranking, there is no Way to recapture from the percentages the 


Indeed, a few years age a president of the Society, 
Scientific detective ereated by R. Austin Freeman, Proposed a technique for 
factoring, or should I say fracturing, a set of individuals into sensible sub- 


minor compared to The Tragedy of y, or 
tpus. By у, I mean the Psychometric meas- 
the main body of the literature. We are now 
et we still do not have a general 


Scaling—a set of theories that relates the various 


The Mystery of the Missing Co 


е. We need an extensive experimental 
hysical Continua, of attitude and 


nua is not logarithmetic in character as 
3 Д ax’, Where y is the value on 
the Psychological continuum and т the value on the Physical continuum). 
He has used his results to argue that certain methods of measurement are 
less appropriate than others, i 


FREDERICK MOSTELLER 287 


In addition to the intensive experimental work recently done by Stevens 
and his coworkers, there has been some instructive work in а mathematical 
direction by R. Duncan Luce. This work considers the two more important 
measurement scales as classified by Stevens.(As mystery-story fans you will 
want to be assured that this is the same Stevens we talked about before and 
not his long-lost Australian twin brother. It is.) These two scales are the 
interval and the ratio scale. The first is essentially determined up to linear 
transformations, that is, you are free to choose an origin and a unit of meas- 
urement arbitrarily, while the second or ratio scale leaves you free only to 
choose the unit of measurement. 

Luce inquires what laws could relate an interval or ratio scale corre- 
sponding to a physical continuum with an interval or ratio scale corresponding 
to a psychological continuum. (There are four possible pairings: interval- 
interval, interval-ratio, ratio-interval, ratio-ratio.) As a lever for this considera- 
tion he makes the innocuous assumption that admissible transformations on 
one continuum ought to produce only admissible transformations on the 
other and that insofar as the choice of origin or of unit of measurement 
is arbitrary on the two scales, it should continue to be arbitrary. That is, 
it should not be forced as a result of the transformation. I shall not give 
all of Luce's results, but I mention two. In order to have a ratio scale on 
both, the power law is the only possible psychophysical relationship. A little 
more usual situation with a ratio scale on the physical continuum and an 
interval scale on the psychological continuum leads either to а linear trans- 
formation of the logarithm of the physical continuum or to a linear trans- 
formation of an arbitrary power of the physical continuum (a slight generali- 
zation of the power law in that it allows the addition of a constant). 

These results are striking moves in one of the directions where research 
is needed. On the other hand, as a solution to the mystery story, they give 
one the cold shivers. Somehow the theories of physics are not so restrained 
in their formulas. How can this be? There are several matters that need 
clarification. First, there is often more than one variable involved in the 
relationship. The existence of additional variables would broaden the ad- 
missible functions. A few possibilities are: dependence upon the number of 
categories used, upon some aspects of a physical stimulus, or upon features 
of the instructions. ems 

А second place to look for expansion of possible laws is in the matter of 
the admissible transformations themselves. The two under consideration, 
the general linear transformation and the more restrictive multiplicative 
transformation, do not begin to exhaust the possible transformations even 
in the one-dimensional case. We are familiar, besides the nominal and ordinal 
classes, with simple translations and with the more complicated type of 
scale generated by the Coombs procedure, which he calls an ES ae 
But many classes of transformations could be used that most of us have 


288 PSYCHOMETRIKA 


never considered. And when I think, in addition, of the possibility of a slight 
rumpling of continua owing to chance variation in functions, I begin to be 
a bit afraid of the strict interpretation of interval and ratio scales in the 
psychological experiments with which I am familiar. Neverthel ss, I feel that 
these scales are instructive and appropriate for the first investigations. 

While we are on this topic, there is the related problem of just what 
kind of statistic is appropriate in the presence of various types of scales of 
measurement. And that question sets me to wondering just what kind of a 
scale of measurement is appropriate for test scores. One might think either 
of items that are scored one or zero or of items that are scored in some other 
manner. Does doubling the score mean doubling the knowledge? 

Tf one knew one had exactly an ordinal scale, no more no less, then the 
notion that order statistics or percentiles ате appropriate measures and that 
averages are not is well taken. But when I do not know exactly what kind 
of scale I have, for instance, I may have nearly an interval scale in the test 
Score case, it seems sensible to use the statistics appropriate to the type of 
ction we may find the justification 
agueness is that we have not yet 
classes appropriate to real life meas- 
and error variance. There is some danger 
measurement. We need a little room for 
ory that every experimenter recognizes. 


Aboratory situation will measure ever more 
sharply. 
Let us illustrate a little further the importance of a slightly soft theory: 
It will be recalled that we obtai i bons тете 


rent scaling tech- 
ly modest transformations of 
accident that such high correla- 


athematical example will clarify 
mations 


the scale. One might suppose that it is an 
tions are achieved but perhaps a further m 
the point. Suppose we consider the transforr 


у= br + (1 = Dy? 


, it is easy to compute the correlati een 9 
n betw 
and y. And after a little arithmetic, we find that the баьо for a given 
value of b is 
1 
рь = = 1 


FREDERICK MOSTELLER 289 


Thus in the worst case with b = 0, we get a correlation of approximately 
0.97, or if you prefer to deal in squares of correlation coefficients, the value 
is exactly 15/16, or about 0.94. This shows that nonlinearly related scales 
can yleld high correlations. And this supports the idea that we should not 
too lightly abandon a statistical method because we cannot assure ourselves 
that we do have exactly an interval or ratio scale. 

An additional reason for vagueness is that we haven't tried to use the 
information contained in the purpose of making the measurements. Very 
often the whyness has a good deal to say about the choice of statistics, as 
you all know. But “why” is a matter that has not been too often raised in a 
discussion like the present one about measurement. The truth is that mathe- 
maties alone will take one a certain distance, just as logic will help the de- 
tective, but after a while the detective needs clues, and information about 
motives and opportunities, just as we need facts about and motives for the 
measurement. 

I have tried to point to the need for broader theories relating methods 
of measurement, and I have pointed to the parallel need for experimental 
work. I shall look forward to your contributions to the search for the missing 
corpus in the journals. 


REFFERENCES 


[1] Gerard, Н. B. and Shapiro, Н. N. Determining the degree of inconsistency in a set of 
paired comparisons. Psychometrika, 1958, 23, 33-46. 

[2] Guilford, J. P. Psychometric methods. (2nd ed.) New York: McGraw-Hill, 1954. 

[3] Kendall, M. G. Advanced theory of statistics. London: Lippincott, 1943. 


Manuscript received 9/2/58 


PSYCHOMETRIKA—VOL, 23, NO. 4 
DECEMBER, 1958 


SOME RELATIONS BETWEEN GUTTMAN'S PRINCIPAL 
COMPONENTS OF SCALE ANALYSIS AND 
OTHER PSYCHOMETRIC THEORY 


FREDERIC M. Lorp 
EDUCATIONAL TESTING SERVICE 


Guttman’s principal components for the weighting system are the item 
scoring weights that maximize the generalized Kuder-Richardson reliability 


coefficient. The principal component for any item is effectively the same as the 
factor loading of the item divided by the item standard deviation, the factor 
loadings being obtained from an ordinary factor analysis of the item inter- 
correlation matrix. 


By Guttman’s definition, a principal component of the score system for 
a set of item responses consists of a vector of numerical scores, one for each 
examinee. There is a corresponding principal component of the weighting 
system consisting of a vector of numerical weights, one for each possible 
response to each test item. The system has the property that the numerical 
score for each examinee is proportional to the sum of the weights for the 
item responses selected by him ([9], pp. 315-321). 

The first principal component constitutes the solution to the problem 
of assigning numerical scores to the examinees in such a fashion as to max- 
imize a certain correlation ratio ([5], pp. 327 ff.). There are also second, 
third, and subsequent principal components, each of which corresponds to 
a local maximum for the correlation ratio, and each of which therefore satis- 
fies the mathematical equations for a maximum. Each principal component 
of the weighting system is a latent vector of a matrix whose elements are 
certain “chi-square product-moments” ([5], p. 332). This isann X n matrix, 
where n is the total number of different possible responses (e.g., if the test 
contains sixty 5-choice items, then n = 60 x 5 = 300). The principal com- 
ponents of the scoring system may be obtained as the latent vectors of an 
N X М matrix, where Л is the number of examinees, or they may be de- 
rived from the weights by simply adding together the weights of the re- 
sponses selected by each examinee. 

The present article will show the following. 

1. Guttman’s principal components for the weighting system are the same 
as the sets of weights that will maximize the generalized Kuder-Richardson 
reliability coefficient. 

2. ars principal components for the weighting system (and thus 
the scoring weights for maximizing test reliability, also) are effectively the 
same as certain sets of item weights obtained by factoring the matrix of item 


291 


292 PSYCHOMETRIKA 


intercorrelations. The weight for any item is equal to its unrotated faetor 
loading divided by its standard deviation. The matrix factored is the m X е" 
matrix of interitem product moment correlations, where m is the number o 


items; the Hotelling (principal components) method of factor analysis ([10], 
ch. 20) is used. 


3. Guttman’s principal com 
correlated with the scores obtained 
$0 as to maximize the generalized 


the factoring of a matrix with 
es the present method. 


first to give a proof that the Kuder-Richardson 
reliability coefficient is maximized by a scoring formula that weights the 


standardized item scores by their factor loadings on the first factor. The 
principal components to test reliability is treated 


Weighting Items to Max 
Kuder-Richardson Re 

A generalization of the Kuder-R 
efficient, appropriate whenever the t 
other than zero and one, has been d 


imize the Generalized 
liability Coefficient 


ichardson formula-20 reliability co- 
est, items 


y Hoyt and Stunkard 

[8] also can be shown to be math i identical with the others. In hs 

recent article, Tryon [11] айу ines of reasoning to justify this 
same coefficient. 

The present section derives the seor 


coefficient alpha. No reliability coefficient that is actually computable will 
ever meet exactly the requirements that would be theoretically desirable in 
an ideal coefficient. Some objections have been raised against coefficient 
alpha. In particular, objection has been raised Against any derivation that 

i equally good measures of the same 
S coefficient are given 12 
ms likely that it will cone 
theory. Further discussion 


ing weights that will maximize 


єх Al] however, and it see: 
ychometrie 


ate here. 
The general formula for coefficient 


alpha is 


FREDERIC M. LORD 293 


where m is the number of test items, V, is the variance of the test scores, 
and V; is the variance of the weighted scores on item 7. Denote by s;; the 
covariance between item 2 and item j before weighting the item scores, 
and denote by s? the variance of item 7 before weighting. Let W; be the 
weight assigned to the score on item 7, the test score of an examinee being 
the weighted sum of the scores on responses chosen by him. It is readily 
shown that, after weighting, V; = sW} and V, = 24 Xy WU, „ Oo- 
efficient alpha may thus be written 


sw 


9 Lom E 7 
Ө Е т = 1 1 У У ғ, 


The present problem is as follows. Given the values of s; and of s;; for 
a given test, find the scoring weights, W; , that will maximize coefficient 
alpha. The value of а in (2) will remain unchanged if all the values of W; 
(i = 1, «++ , m) are multiplied by the same constant factor; hence, one re- 
striction may be arbitrarily imposed on the values of IV; . It is convenient 
to require that the Т; shall be so restricted that the variance of test scores 
remains fixed: 

(3) V, = > 32s;W,W, = constant. 
сай: 

Since a can never be more than 1, the problem is seen to be one of max- 
imizing one quadratie form, > s8 W? , when another, УУ QW; , is 
held constant. Denote the two quadratic forms in matrix notation by w'D^w 
and w’Sw, respectively, where w is the column vector (Wi, Ws,-:*, Wl, 
D is the diagonal matrix whose elements are з; , and S = [s;] . A general 
theorem on quadratic forms ([12], pp. 170-171) states that the values of 
TW; giving a maximum are those satisfying the matrix equation 


(4) (D? — AS)w = 0, 
where А is a constant to be determined. | PE 
A more convenient equation is obtained by premultiplying (4) by 
‘ A к йш: 
—2A"D^! and making the substitutions w = D` w, и = 1/A, and D 8р 
= k: 
(5) (R — uu = 0. 


ristic equation of the matrix of interitem pro- 


Squation (5) i aract А 
TO cle = [r;] The desired optimum weights for 


duet moment correlations, Ё = 
maximizing a are thus 


(6) 


where (U, , U2, +++ , U,]' is the fi 
lation matrix. 


W: = Ui/si › 


rst characteristic vector of the corre- 


294 PSYCHOMETRIKA 


The foregoing result may be summarized in slightly different нт by 
saying that the test score having the maximum generalized Kuder-Richara m 
formula-20 reliability coefficient is obtained by a scoring formula that weigh. 
the standardized item scores by their factor loadings on their first principal com- 
ponent. Here it is understood that attention is restricted to test scores hr" 
are linear functions of the item Scores, that the "standardized item score 
is the unweighted item score divided by s; , and that the factor loadings are 
obtained by a (principal components) factor analysis of the interitem pro- 
duct moment correlations with unities in the diagonal. 


The results given here are essentially the same as those obtained by 
Horst [7] for a different problem. He sho 


ortional to its factor loading on the 
y. Edgerton and Kolbe [3] and Wilks 
solution after starting with still dif- 


There are several other methods for determinin: 
combining tests into a composite ([4], 


Relation of Gutiman’s Principal Components to the Generalized 
Kuder-Richardso 


n. Reliability Coefficient* 
Let z,, be the Scoring weight 


of alternative response c for item 7. Let 
Yia be the score obtained by examinee a on item 7, so that Vi, = Zie When- 
ever examinee a chooses response 


c. Following Hoyt and Stunkard [8], let 

Y.a = 277. Ysa (the total score of examinee a), y, = Aa yu, and y.. 
i 225 Yia . An analysis of variance table may be written in part as shown 
below. 


Sum of Squares 
Among examinees 


1) 1 
Lg Rao ور اہ‎ 
m x Wig Nm V 
5 1 2 
Among items В = E x i. = Ez y. 
Residual ar = AUD 
Total T ف س‎ 
TT 


*The writer’s original mathematical proof restricted its 
items. The simpler proof presented here, valid for p 


elf to the case of dichotomous 
subsequently by Professor Ledyard R Tucker, 


olychotomous items, was worked out 


FREDERIC M. LORD 295 

Now, by definition, 

(7) Vo» XEW.- uy. 
N < 1 
thus, 
(8) EV. = (Т – BN. 
Likewise, 
б РУТЕ ИС В 

(9) T= >; Yon — a. mA/N. 
Consequently, from (1), 
1 __т ( т – B) E 
OG t= msl 1 mA 1 (m — DA 


Guttman’s principal components of the weighting system are the item 
scoring weights that maximize the correlation ratio ([5], eq. 1) 


(11) i = 
Since T = A +В + С, 

1 B. 
1 La В, С. 
(12) = ІФА я 


It only remains to show that the item scoring weights, 17, , that maximize 
(11) are the same as those that maximize (10). 

Guttman has shown ([5], eq. 16) that his optimum item weights have 
the property that the average score for an item (y;, /N) is the same for all 
items. (Since the origin is arbitrary, Guttman makes this average equal to 
zero.) Thus B — 0 whenever Guttman’s optimum scoring weights are used. 
This condition on the item weights imposes no restriction on the value of a: 
it is well known that test reliability depends on the spread of the scoring 
weights assigned to the items, not on their average value. 

When B = 0, it is seen from (12) and (10) that 

1 1 я 
x J n = 1()1 ¬ ¿ 
i4 £ 1+ (7 X ) 


(13) л: 


It is obvious from (13) that 1s will be maximized by maximizing o, and 
conversely. 
REFERENCES 
[1] Cronbach, L. J. Coefficient alpha and the internal structure of tests. Psychometrika, 
1951, 16, 297-334. 
[2] Dressel, P. L. Some remarks on 
metrika, 1940, 5, 305-310. 


the Kuder-Richardson reliability coefficient. Psycho- 


296 


[9 


[10 
[11 


[12 


[13 


[14 


PSYCHOMETRIKA 


Edgerton, H. A. and Kolbe, L. E. The method of minimum variation for the combin- 
ation of criteria. Psychometrika, 1936, 1, 183-187. 


Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 
Guttman, L. The quantification of a class of attributes: a theory and method of scale 


construction. In P. Horst (Ed.), The prediction of personal adjustment. Soc. Sci. Res. 
Council, Bull. 48, 1941. Pp. 321-345. 


Guttman, L. A basis for analyzing test-retest reliability, Psychometrika, 1945, 10, 
225-282. 
Horst, P. Obtaining a composite measure 


from a number of different, measures of the 
same attribute. Psychometrika, 1936, 1, 53-60. 
Hoyt, C. J. and Stunkard, C. L. Estimation of test reliability for unrestrieted item 
scoring methods. Educ. psychol. Measmt, 1952,12, 756-758. 
Stouffer, S. A. (Ed.) Measure: 
World War II, Vol. IV. Princeton, N. J. : Princeton Univ. Press, 1950. 
Thurstone, L. L. Multiple-factor analysis. Chicago: Uniy. Chicago Press, 1947. 
Tryon, В. C. Reliability and behavior domain validity: reformulation and historical 
critique. Psychol. Bull., 1957, 54, 229-249, 
Turnbull, H. W. and Aitken, A. C. An introduction to the theory of canonical matrices. 
Toronto: Blackie, 1950. 


Wilks, S. S. Weighting systems for linear functions of corre! 
is no dependent variable. Psychometrika, 1938, 3, 23-40, 
Woodbury, M. A. and Lord, F. M. The most reliable composite with a specified true 
score. Brit. J. statist, Psychol., 1956, 9, 21-28. 


lated variables when there 


Manuscript received 9/11/57 
Revised manuscript received 2/11/58 


PSYCHOMETRIKA—VOL. 23, NO. З 
SEPTEMBER, 1958 


TO WHAT EXTENT CAN COMMUNALITIES REDUCE RANK?* 


Lours GUTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH AND 
THE HEBREW UNIVERSITY IN JERUSALEM 


The question is raised as to whether the null hypothesis concerning 
the number of common factors underlying a given set of correlations should 
be that this number is small. Psychological and algebraic evidence indicate 
that a more appropriate null hypothesis is that the number is relatively 
large, and that smallness should be but an alternative hypothesis. The ques- 
tion is also raised as to why approximation procedures should be aimed 
primarily at the observed correlation matrix Ё and not at, say, R-!. What 
may be best for R may be worst for 2~', and conversely, yet R~ is directly 
involved in problems of multiple and partial regressions. It is shown that a 
widely accepted inequality for the possible rank to which Ё can be reduced, 
when modified by communalities, is indeed false. 


When Charles Spearman hypothesized, some fifty years ago, that 
correlations among certain mental test scores could be accounted for by but 
a single common factor, this was greeted by many of his colleagues with 
substantial scepticism. In current terminology, initiated by L. L. Thurstone, 
they thought it implausible that communalities could be found that would 
reduce the given type of observed correlation matrix to rank one. Thurstone 
later hypothesized that relatively small rank could be attained for correl- 
ation matrices of mental test data by use of communalities. This hypothesis, 
too, has encountered a measure of disbelief in various quarters. 

Motivation for seeking small rank stems from the desire to reproduce 


the observed correlations among т variables by using scores on a smaller 
A necessary and sufficient condition 


do the reproducing trick is that there 
ank of the observed correlation matrix 
[S]. In this sense, rank and number of 
algebraically convenient to deal with 
f common factors for a given set of 


number, say m, of common factors. 
that there exist m common factors to 
exist communalities that reduce the г 
to m, but leave the matrix Gramian 
common faclors are equivalent. It is 
rank in studying the possible number 0 
data. uu 
Empirieal attempts to estimate minimal rank in given cases have 
hitherto not been clear cut for lack of rigorous theory and computing routines 
for fallible data. Among the better efforts in this direction are the works 
of Lawley [13] and Rao [16]. But these do not presume to do more than give 
ide any upper bounds. 


a lower bound to the minimal rank; they do not prov і 
*This research was facilitated by а noncommitted grant-in-aid to the writer from 


the Ford Foundation. 
297 


298 PSYCHOMETRIKA 


Regardless, if all results published to date be taken at their face — 
together constitute abundant evidence against the Thurstone pa j тян 
Hundreds of different common factors for mental abilities n s he 
“identified” by Thurstonian computing routines, and the ew 18 Eee 
growing, with no upper limit in Sight (cf. [2] and the discussions in [ а 
+ s association of the notion of communality coefficients with that 
of minimal rank seems to be an historical accident, as it is not logically 


necessary [4, 8, 9, 10]. Identifying the concept of small rank with that of 
scientific parsimony also seems fortuitous, as this too has no logical com- 
pulsion; other kinds of parsimony are possible [6, 12]. In view of the дейсе 
favoring large rank—even when communalities are used—for the entire 


domain of mental abilities, it is fortunate for the communality concept 
that it is useful and meaningful regardless, For example, a unique definition 


, however, typically study but a small 
sector of the domain of mental abilities and not the entire domain. Is it 
communalities will m 


an algebraic fact th. 
ven correlation matr 
т, Where the inequal 


eaningfully reduce rank 
at, psychological mean- 
ix of order n can always 
ity is satisfied, 


(1) 


as was shown by several authors (1, 14, 18]? 
The purpose of the present paper is to explore these last questions. 
Despite the rather widespread belief in į i 


Ї in inequality (1), it can be proved false. 
Its authors make a valiant attem 


А ; against small rank being 
the general algebraic ease, For fallible data, this implies that large rank 
should be the null hypothesis, j = 
empirical evidence to the co ent computing routines have 
the opposite orientation; their jus 


LOUIS GUTTMAN 299 


Great Complexity as the Null Hypothesis for Mental Test Data 


Empirical research of the past five decades and more has consistently 
revealed positive correlations of varying sizes among mental test scores. 
On the face of it, the structure of the interrelations seems very complex; 
psychologists will tend to demand substantial evidence before they will 
accept any hypothesis that states otherwise. : 

For example, no one will seriously entertain the simplest hypothesis 
of zero rank. Suppose some experimenter were to take some standard mental 
tests, administer them to 25 pupils, apply а standard test of significance 
to the sample correlations, and conclude that no population correlation 
coefficient was different from zero. The reaction of his colleagues might 
typically be to assert that the sample of 25 pupils was too small, and to 
reject the experimenter's conclusions despite his evidence. Zero correlation 
is generally not an appropriate null hypothesis for mental test data; it is 
rather an alternative hypothesis which might be acceptable in a given case 
only after weighty evidence against the general run of experience with positive 
correlations for such data. 

It might be noted that the newer Lawley-Rao tests of significance 
might also find that the over-all rank of the observed correlations, using 
communalities, was not significantly different from zero for the same data 
above. The zero-rank hypothesis would still find few buyers among 
psychologists. 

Statistical textbooks almost invariably start with zero as a null hypothesis 
for correlation coefficients. Historical reasons for this may lie in the early 
use of mathematical statistics in biology and other fields. Whatever the 
cause, such a habit need not be adopted uncritically by psychometricians, 
especially for mental test data with which they have had so much experience. 
It is just such experience which makes psychometricians regard not only 
zero rank, but also Spearman's and Thurstone's hypotheses of small rank 
to be alternatives, and to be acceptable only if the data warrant rejecting 
the hitherto more plausible null hypothesis of great complexity. 


How Did the Shoe Get On the Other Foot? 


Despite the fact that the Spearman-Thurstone type of hypothesis is 
relatively novel, some followers of Thurstone appear to have reversed 
problem of how to test it. They seem to accept the hypothesis of small О 

a priori, and demand substantial evidence before they will peter in Es 
rank. This is indicated in their approach to the question of when to z op 
factoring.” That question arose historically out of certain бошон rou i 
aimed at extracting one factor at a time from an empirical corre pei E. у 
Somehow, the notion became current that there was а danger of extracting 


300 PSYCHOMETRIKA 


too many factors, or more than were warranted by the data. This fear patently 
puts the shoe on the other foot: it makes small rank the null hypothesis, 
or it necessitates acceptance of the Thurstonian hypothesis unless there is 
strong evidence to the contrary. Such a lenient attitude towards a new 
hypothesis is quite unusual in the history of science. 

But the attitude is not entirely unambiguous. Proponents of Thurstone's 
thesis recognize to a certain extent that it should not be accorded the status 
of a null hypothesis. They express concern ab 
mere algebraic artifact from use of commun. 
involves belief in inequality (1) above, Thus, it has been concluded from (1) 
that “Three tests can always be reduced to rank 1 in a finite number of 
Ways +» Four tests can always be reduced to rank 2 in an infinite number of 
ways: +- with 6 tests it is in general possible to attain rank 3 without any 
restrictions on the (observed correlation) coefficients... ([14], p. 92). 

Belief in inequality (1) led Thurstone himself to suggest that the right 
member of (1) be the null hypothesis for the minimal rank [ef. 18]. However, 
no one has come forward with an explicit computing routine based on such 
a null hypothesis, Inequality (1) can hardly be taken as à rigorous point 
of departure, for it is in fact false. This is easy to show by a simple example, 


as in the following section, The best possible universal upper bound to 
minimal rank m is actually n — 1, 


out obtaining low rank as a 
alities, a concern that often 


The Best Possible General Upper Bound: n — 1 


It is well known that if Risa nonsingular corre] 

n, then it can always be reduced at 1 
[18]. One way of accomplishing such 
latent root of № from each main diagonal elem i i i 
ter ent. If this root lti- 
plicity p, then the rank of the modified hihi 
select one value of j, and replace the 


ay must be 
e any one of 


hence are proper. 


Granted that nonsingular R can always be г 


| : educed t tn — 

one in general do better than Gs? ЖАЫ ле a ol fo ra es 
it is impossible to go below rank n — 1 suffices to answer thi ample w Е 

а purely algebraie point of view, 1S question from 


Consider the following symmetric matrix R, of order n: 


LOUIS GUTTMAN 301 


[n mn 0 0 0 
T; Zi Т 0 0 
0 т» 2% Ts 0 0 
(2) R,-|0 б mn x 0 0 
0 0 0 0 t tnra T 
LO 0 OF Ü ==- „у £& d 


Assume that the 7; are fixed numbers, observed correlation coefficients for 
the problem of factor analysis, none of which vanish, 


(3) #0, (12, n=. 


The main diagonal elements x; (j = 1, 2, +++ , n) are arbitrary and may be 
determined so as to minimize the rank of R, . All other elements of R, vanish. 

No matter what values are chosen for the 2; , the resulting rank of R 
cannot be less than n — 1. To prove this, it is sufficient to examine the 
submatrix of order n — 1 obtained by omitting the first row and last column 
of R, . This is a triangular matrix, whose main diagonal elements are the 
т; and in which all elements below the main diagonal vanish. Its determinant 
is merely the product of the т — 1 coefficients 7; , and is unaffected by the 
virtue of (3), this determinant does not vanish, so Rs 


choice of the z; . By 
has at least one submatrix of rank n = 1. Hence, R- itself can never have 


rank less than n — 1, no matter what values are used in the main diagonal. 
This establishes our theorem that n — 1 cannot be improved on as a universal 


upper bound to minimal rank. 


Relative Algebraic Incidences of Large and Small Minimal Proper Ranks 


e of R, establishes а bound for rank for the set of 
ramian ones. It has been recognized by 
(1) that the communalities implied 


Note that our exampl 
all symmetric matrices, not just G 


those who have believed in inequality 
need not be proper. Some communalities might be larger than the self- 


correlations of their corresponding observed variables, yielding so-called 
Heywood cases [17], and/or the modified R might not be Gramian. But our 
example shows that even with improper communalities, rank need not be 
reducible below n — 1. No restrictions whatsoever were put on the z; , yet 


R, cannot be of rank less than n — jm s 
of proper communalities, it becomes 


If tk ^ i ricted to use 
the problem is restricte al algebraic case. In [10] 


even clearer that small rank cannot be the genen pee 
it was shown that every nonsingular correlation matrix of order n as a 
another nonsingular correlation matrix 


one-to-one correspondence with 


302 PSYCHOMETRIKA 


R* or order n (where R* is directly related to R^) such that if m and m 
are the minimal ranks to which R and R* are reducible respectively by use of 
proper communalities, then in general 


(4) m 2 т = т. 


Thus, m and m* cannot in general simultaneously be small compared with т. 
If m* < n/2, then m = n/2 and if m < n/2 then m* = n/2, 

Therefore, considering the set of all possible 
matrices of order n, the subset that can be properly г 
cannot be larger (have a larger cardinal number) t 
proper minimal rank is m > n/2. More 
are properly reducible to rank 1 cannot be larger than the subset of matrices 
that are properly reducible to rank n — 1; the subset for which m = 2 cannot 


be larger than the subset for which т =n — 2 ete. 


Since m and m* can be, and often are, large simultaneously, should any 
probability measure be attached to the set of all nonsingular correlation 
matrices, it would in general show that large minimal proper rank is more 
probable than small proper rank. (The measure implied here is over popula- 


tion matrices; similar reasoning will of course hold for sample matrices, but 
the latter case is not our present concern.) 


nonsingular correlation 
educed to rank m « n/2 
han the subset for which 
generally, the subset of matrices that 


Psychological Considerations 


Some readers may accept the algebraic theorems of the 1 


ast two sections, 
5 Section may seem to them too general, and 
Psychological data, On the other hand, 


that the Thurstonian 
lon matrix. Such students 
y, Mathematical 
atistical tests of S 
general algebra. 


approach is an 
should welcome 
Statisticians who are 
ignificance or decision 


of the preceding section may 
be but another example of putting the shoe on the wrong feck The onus of 
plainers: if they believe Psychological data differ 
from others, it is they who should provide the evidence, 
It so happens that the general algebraic ca. 


general algeb 
psychology to make out an a priori case for 


LOUIS GUTTMAN 303 


The Simplex As a Counter Example 

That R, is not typical of psychologically observed correlation matrices 
is certainly correct, There is, however, a large class of matrices approximated 
in psychological practice for which n — 2 is the minimal rank. These matrices 
are closely related to 2,7", and proof of their rank properties is simplified 
by use of the rank theorem on R; . 

Abundant empirical evidence testifies that when the kind of mental 
ability tested is held constant and only level of complexity is varied among 
tests, the resulting correlation matrix will tend to be some type of simplex 
[0, 7, 11]. Let R be the correlation matrix for a perfect additive simplex, 
which is one of the possible types. Thus, if rj, is the coefficient in row j and 
column k of R, 

(5) тъ = а/а (j Sk), 

d with the jth variable (j = 1, 2, +-+ , n). 
According to (5), tests of more similar levels of complexity (whose coefficients 
a; are more nearly equal) will correlate more highly with each other than 
will tests of less similar levels of complexity. The subscripts j are assigned 


the tests according to their order of complexity. ry f 
Tt has been shown in [6] that if R is defined by (5) then R™ is precisely 


of the form R, in (2), or 


(6) В = Ry 


where a; is some coefficient associate 


1 R be reduced by modifying its main diagonal? Let D 
be an arbitrary diagonal matrix of order n. What is the minimum rank 
possible for R — D? To answer this, first note that the rank of R — D is 
the same as that of D — R. It then follows from the identity 


(7) R(D-R-R'D-lI, 


where I is the unit matrix of order m, ihat the rank of R — D is 
that of R^*D — Г; for the rank of the left member of (7) is that of D = R 
since premultiplying by a nonsingular matrix does not affect rank. Recalling 
(6), it is established that the rank of R-D equals that of R.D — 1. | 
While R.D — I is an asymmetric matrix, if written out explicitly as 
is R, in (2), it always has a nonvanishing minor determinant of order n — 2. 
Two cases have to be considered: where D has no vanishing diagonal elements, 
and where D has one or more vanishing diagonal elements. Proof is left to 
the reader. It follows that R.D — T cannot be of rank less than n — 2 so 
neither can R — D, which was to be proved. : " 
Notice that D has not been restricted here to yield proper communal ifion: 
The diagonal elements of D may be positive or negative, and R — D iE 
not be Gramian. Regardless, R — D can never attain rank less than n — 


To what rank car 


304 PSYCHOMETRIKA 


when Ё is a nonsingular correlation matrix for a perfect additive simplex 


defined by (5). 


Further Structural Psychological Possibilities 


Artificiality remains in using the algebra of a perfect simplex such 
as defined by (5); data do not conform to this in practice, because in essence 
(5) represents the structure of the correlations after correction for uniqueness. 
Raw correlations conform more directly to the form 


(8) Tix = аф, G< k), 


where b, is proportional to ax", so that the communality of test j is given 
by the right member of (8) when j = k: hi = a;b; . Given a raw matrix with 
d with unities down the main diagonal, 
ss than n — 2 by modifying the main 
y the proportionality relation of (5) to 
educed in rank by tampering with the 

ў of is left to the reader. (Write b, = c/a; , 
where c is some constant, and the proof is almost immediate.) 

The inverse matrix for such an imperfect simplex as (8) is not in general 
of the form R, ; the nice pattern of exactly zero cells no longer holds. Re- 
gardless, rank cannot be reduced below n — 2 by communalities. 

For the more general type of simplex analyzed in [7], it сап be shown 
that rank cannot be reduced in general below n — 3, whether in the perfect 
case or in the imperfect case where the é-law of deviation holds. There are 
many classes of psychological data for which such a model should be appropri- 
ate, since it is formally identical with the stochastic process of uncorrelated 


increments [8]. One psychological interpretation may be that of increments 
in complexity, as we have already indicated. Another interpretation may 
be that of Increments in speed. 


A study involving different speed levels of each of three different kinds 
of content recently was published by Lord [15]. 
of correlation coefficients for each kind of content Separately shows an 
obvious gradient typical for a si i 5 


The relative inadequacy of such an attempt at rank reduction should 
become clear by considering a sub-battery of test for one content alone, say 


LOUIS GUTTMAN 305 


arithmetic. Suppose a pool of items of similar complexity were available, 
and tests were constructed from them to be varied only in the time limits of 
administration: 1 minute, 2 minutes, , minutes. What should the 
structure of the resulting correlation matrix be? If the increments in scores 
between successive time limits are uncorrelated with each other—as Lord's 
data tend to show—then communalities cannot reduce rank below n — 3. 
Furthermore, even such rank reducing communalities would not be the 
most natural ones to use, for they would destroy the possibility of obtaining 
the parsimoniously structured inverse matrix that exists for а simplex [7], 
so important for prediction purposes. 

Since speed is a pervasive concept in psychological abilities, as well as 
level of complexity, there is thus further psychological argument against 
expecting small reduced rank to be possible with actual psychological data. 

Recent developments in facet design for kind of complexity, as dis- 
tinguished from level of complexity and from speed, provide further examples 
of possible psychological structures for which an explicit algebraic structure 
or common-factor pattern may be specified [12]. Given such explicit algebra, 
it should be possible to develop theorems on what communalities can and 
cannot do for such cases. The rapidity with which such designs attain sub- 
stantial complexity (but in a systematically simple manner) again suggests 
that the concept of communality should be dissociated from that of rank 
reduction. Instead, the concept might be defined in terms of the ó-law of 
deviation [6]: find that set of unique factor loadings to satisfy ihe ó-law such 
that the best fit can be obtained from the hypothesized algebraic structure 
(according to the facet design) to the observed correlation matrix, This is 
2 type of parsimony that has little or nothing to do with rank. | 

It is not the purpose of the present paper to go into AES Де 
the rank-reduetion approach to the communality problem. Some have already 


been indieated, and others may be possible. 


The Approximation Argument 
is E is point may still feel no great 

5 { "е read this paper to this poin 
pepe с ү procedures that attempt to stop at 


need for abandoning current computing ll the above algebraic and 
as small a rank as possible. They may йшй oe 5 


Е i А oint. That 
psychological arguments, but believe -— pedea i i concede, 
sd ; given са: 

there are many eommon factors in mos g 7 

a a hey may 
even as many as there are observed tests. But a а tae en 
claim, should be interested only in that small number у и соо 
the observed correlations, and ignore ihe ташла ИН 


А e ә 
i i for the discrepancies. That is, apar 
ce ما‎ a coii dete al common factors that contribute 


from unique factors, they advise 1 Е а approximations are коп, ils 

but little to the intercorrelations. Since ОПУ © 

exact theorems on rank are irrelev: 
In reply to such an argument, 


ant, they might say. 


it might be pointed out that usual, 


306 PSYCHOMETRIKA 


down when considering 
What is а good approxi- 
There is no simple answer 

iate distribution that one 
might be interested in estimating; a good approximation to one aspect may 
n to another aspect, 


same token, the best single se 
of R™ is the latent vector as 


what is best for R is worst for R^, and vice versa, Why should the approxi- 


R, as is done traditionally, rather than to 
nvolved in prediction problems concerning 
the observed variables, and from this point of view R^! is more important 
than R. Furthermore, if D is nonsingular, it is еазу to see that the rank 
of R — D is that of R^ — p~ (postmultiply (7) through by D`}, letting 
R now be arbitrary), but the approximation problem takes on a different 
aspect in each case if posed in terms of rank, 


ught to be a 
contributed some confusion to th 


RU? It is R` that is directly i 


Pproximated no doubt has 
e communality problem, (It might be 
X approach, the approximations to both and Ё 


ne [18] and Lederman has 
pointed out an earlier treatment by Shepard. Each aut АЫ 


i number of unknowns i than 
the number of equations, and arrive at (1), wns is not less 


d apparently claims (1) a 
E 8 ап actual theorem. 
But they, and many students following them, act as though it fus a theorem 


LOUIS GUTTMAN 307 


for all practical purposes. The fallacy, of course, results from the fact that 
equations need not be solvable merely because the number of unknowns 
equals, or even exceeds, the number of equations. 

Many students have fallen into the habit of believing that n equations 
in n unknowns are in general solvable. High school and college drill on linear 
equations of course should lead to such a belief, since.almost all the examples 
the teachers choose are indeed solvable. It is another matter to inquire into 
the natural frequency of solvability. Even if one should agree, on geometric 
grounds, that solvability should hold more often than not for linear equations 
(i.e., collinearity or linear dependence is less likely to occur than not), this 
says little or nothing about nonlinear equations. For example, consider two 
simultaneous equations in two unknowns whose loci are each ellipses: are 
arbitrary ellipses in a plane more likely to intersect than not? Merely count- 
ing equations and unknowns is not very helpful here. | 

The equations behind proposed inequality (1) are of quite a complex 
curvilinear nature. It borders on wishful thinking to believe a priori that 
they should be solvable at all in any given case, even if imaginary or other 
improper solutions be allowed. That it has been possible to develop concrete 
algebraic theorems on this problem of frequency of solvability, as discussed 
in this paper, may help dispel some prevalent miscomprehensions concerning 
communaliti k. OR 

гат с to factor analysis in this regard may be similar 
to what happened much earlier with regard to belief in the normal (Gaussian) 
curve for psychological data. As someone once pointed out, indu 
thought it was the mathematicians who proved that the poe iatri k 108. 
must hold for their data, while mathematicians n р use o Ro 
equation was justified because psychologists пешин E to all pd A 

Some psychologists may have thought tint e E Ait е 
hypothesis was justified by algebraic €: М ‘hull h othesis 
Statisticians may have thought that ak ew um is comet Аер 
Was justified by psychological considerations. ls more proper null hypothesis 
and psychology both indicate large rank to lei m п а. null hypothesis i 
for the communality problem for mental tes a E nnum on: EUN 
Sustained, this implies other concepts of pars 
for than that of small rank. 

CES 
ji pem overlap. Brit. J. Psychol., Statist. Sect., 
Ш Burt, C. Bipolar factors as a cause of cyclic à 
1952, 7-202. iques. In L'an 
2] Cattell, n в. n T universel pour les “ee Le a nena 1956. 
et ses application, Paris: Centre National Г Wiley, 1954. j 
[3] Doob, J. W. Stochastic processes. New York: s quantitative variates. Psychometrika, 
[4] Guttman, L. Image theory for the вітоше 
1953, 18, 277-296. 


PSYCHOMETRIKA 
[5] Guttman, L. Some necessary conditions for common-factor analysis. Psychometrika, 
1954, 19, 149-161. ў 
[6 Guttman, L. A new approach to factor analysis: the radex. In P, F. Lazarsfeld (Ed.), 
Mathematical thinking in the social sciences. Glencoe, Ill.: Free Press, 1954. -— 
[7] Guttman, L. A generalized simplex for factor analysis. Psychometrika, 1955, 20, 
173-192, | 
[8] Guttman, L. The determinacy of factor score matrices with implications for five 
other basic problems of common-factor theory, Brit, J. statist. Psychol., 1955, 8, 
65-81. 2 
[9] Guttman, L. Une solution au probléme des communautés, Bulletin du Centre d' Études 
et Recherches Psychotechniques, 1956, 6, 123-198, m 
10] Guttman, L. “Best possible” systematic estimates of communalities. Psychometrika, 
[ 
1956, 21, 273-285, | d 
[11] Guttman, L. Empirical verification of the radex structure of mental abilities an 
personality traits. Educ, psychol. Measmt, 1957, 17, 391-407. 
[12] Guttman, L. What lies ahead for factor analysis? Educ, psychol. Measmt, in prons: 
[13] Lawley, D. N. The estimation of factor loadings by the method of maximum likeli- 
hood. Proc. Roy. Soc. Edin., 1940, 60, 64-82, . 
[14] Ledermann, W. On the rank of the reduced correlation matrix in multiple-factor 
analysis. Psychometrika, 1937, 2, 85-93, y 
[15] Lord, F. M. A study of speed factors in tests and academic grades, Psychometrika, 
1956, 21, 31-50, 
[16] Rao, C. R. Estimation and tests of significance in factor analysis. Psychometrika, 
1955, 20, 93-111. 
[її о G. TI factorial analysis of human ability. (5th ed.) London: Univ. 
ondon ress, o1. 
[18] Thurstone, L. L, Multipe-factor analysis, Chicago: Univ, Chicago Pross, 1947. 
[19 Wrigley, C. The distinction between common and specific variance in factor theory: 
Brit. J. statist. Psychol., 1957, 10, 81-98, 
Man 


uscript received 7/. 19/57 


Revised manuscript received 1/3/58 


Ex 


PSYCHOMETRIKA—YVOL, 23, No, 4 
DECEMBER, 1958 


А MARKOV MODEL FOR DISCRIMINATION LEARNING* 


RICHARD C. ATKINSON 
UNIVERSITY OF CALIFORNIA, LOS ANGELES 


A theory for discrimination learning which incorporates the concept of an 
observing response is presented. The theory is developed in detail for experi- 
mental procedures in which two stimuli are employed and two responses are 
available to the subject. Applications of the model to cases involving probabi- 
listic and nonprobabilistie schedules of reinforcement are considered; some pre- 
dictions are derived and compared with experimental results. 


This paper is a preliminary attempt to develop a quantitative theory 
of discrimination learning. For simplicity, the discussion will be limited to 
two-response problems, but the formulation can be extended readily to 
certain m-response situations. The model corresponds in some respects to 
theoretical analysis of discrimination learning presented by Burke and 
Estes [5] and Bush and Mosteller [6]. In particular, the stimulus concept- 
ualization and response conditioning process are similar to their formula- 
lions. "The model, however, differs from their Work in that an orienting or 
observing response [16] is postulated. This additional feature leads to pre- 
dictions which, for some experimental parameter values, are markedly dif- 
ferent from those made by either Burke and Estes or Bush and Mosteller, 
while for other parameter values the predictions are identical. Interrelations 
among these models will be considered later. А 7 . 

'The theory is designed to analyze behavior in an experimental setup 
where two stimuli, designated T; and T» , are employed aad БО. хонио, 
А, and А, , are available to the subject. Each trial begins Ta s roc nd 
tion of either 7, or T, ; the probability of T is B, and the уы ability ма Tg in 
1 — $. Following T, , an A, response 18 correct with probability н, J oe AD + 
response is correct with probability 1 — m » Following Ту, m 1 өө ксы 
correct with probability ra , and an А» response is correct with probability 
173m. «mination problem is described when r, = 1 

The traditional type of eta ne я а to the presentation 
and rs = 0. The subject must learn tation of Т». A form of discrimina- 
of T, and respond with А» to the pcm of reinforcement, is specified 
tion learning, involving probabilistic ү jmination problem has been only 
When mz, , ma Æ 0 or 1. This type of diser a E 
recently investigated [9, 10, 14]. 


a grant from the National Science Foundation. 
AE 


*This research was supported b 
309 


310 PSYCHOMETRIKA 


Theoretical Concepts 
Stimulus Representation 


The stimuli T, and T, are to be represented conceptually as two sets 
of stimulus elements, which are designated ©, and ©, , respectively. Further, 
a set C is designated which represents those stimulus elements common to 
sets ©, and ©, (C = €, N G,), i.e., those stimulus events common to the 
presentation of either Т, or T, . In regard to the size of the C set, an index 
of similarity between T, and T, can be defined; the larger the relative size 
of C with respect to ©, and ©, the greater the similarity between the stimuli 

6). 
n To simplify subsequent notation let the set S, be all stimulus elements 


in €, but not in C. Similarly, S; is the set of all stimulus elements in €; 
but not in C. Specifically, 


(1) 8, = ©, — С, Sa = ©„— C. 
Let N, , N; , and N. be the number of elements in sets S, ; 8 , and C, re- 
Spectively. Finally, define 
= N à. m N 2 
(2) YW, = N, FN,’ We= 


Orienting Response 


It is hypothesized that the organism makes one of two responses at 
the start of each trial, either an orienting response or a nonorienting re- 
Sponse. "These responses are designated O and 0, respectively. If О occurs, 
then the organism is exposed to the unique stimulus elements on that trial. 
More specifically, the organism will be exposed to only the S, stimulus 
elements if T', is presented and to only the S; stimulus elements if T's is 
presented. If, on the other hand, О occurs, then the organism is ex 
both unique and common stimulus ele s 
be exposed to both S, and C stimul 
both Sz and C stimulus elements if Т. 

It is assumed that the О and O responses are elici imuli 
associated with the beginning of the à Deen ADi 


trial. Thus, the ве on 

a given trial is as follows, i EG i 
(i) The onset of the trial is associated with the i of 
stimulus elements $5. тышаш AT di 


Gi) © elicits either an О or O response, 
(iii) If О occurs, the organism 18 exposed to the S 
the S; set on Т» trials. If О occurs, the organi 


C on Т, trials and to S, and C on T, trials, 


Conditioning Relations and Response Probability 


1 Set on Т, trials and to 


On any trial of an experiment, all elements of a given stimulus set are 


RICHARD C. ATKINSON 311 


conditioned to one and only one response. The entire © set is conditioned 
to either O or О. Similarly, the S; set (2 = 1 or 2) is conditioned to either 
A, or А, and the C set is conditioned to either А, or A, . 

The probability of a response in the presence of particular stimulus 
elements is defined as the proportion of stimulus elements conditioned to 
the response [1, 8]. Thus, the probability of O on trial n, p, (О), is either 
1 or 0, depending on whether the © set is conditioned to О or О at the start 
of trial n. The probability of А, when only 5; (è = 1 or 2) is presented (i.e., 
when an О response has occurred at the start of the trial) is 1 or 0 depending 
on whether the S; set is conditioned to A, or A; . Finally, the probability of 
A, when 8; and С are presented (i.e., when an О has occurred at the start of 
the trial) is, (i) 1 if both the S; set and the C set are conditioned to A, , 
(ii) W, if the S; set is conditioned to А, and the C set is conditioned to A, , 
(iii) 1 — W; if the 5, set is conditioned to A, and the C set is conditioned to 
A, and, (iv) 0 if both the S; set and the C set are conditioned to А, . 


Conditioning Process 
A single parameter 0 is assumed which governs the conditioning of 
Stimulus sets. On a given trial all elements from © and available elements 
from ©, will be conditioned with probability 0 to an appropriate response, 
and the conditioned status of all elements will remain unchanged with 
probability 1 — 6. Only those elements in €; and & which are exposed to 
the organism on a given trial are available for conditioning. If an O response 
1, then either S, or S; is available for condi- 


is made at the start of the trial s 
tioning on the trial. If an О response is made, then either S, and C or Sa 
and C are available for conditioning. Specifieally the following cases en- 


IO hs ra correct. An observing response is made and makes 
set S; available: The ‘gat S; elicits A; , which is designated correct. Given 
this sequence of events, there is (i) a probability 9 that all elements in O 

q will be conditioned to A; 


wi iti nd all elements in 8; 
ss yero A 8 that the conditional status of the element will 
Д 


ш ` S و‎ A, — not correct. An observing response is made, and 
makes set 8. ani The set S; elicits A; , which js incorregt, bos us 
Sequence of events there is (i) à probability ; acti -—— A ‘than 
conditioned to О and all elements in б; will be con ш юш ie er 
the one which occurred on the trial, and Gi) B ial | 
ditional status of the elements will db o eee is made and makes 
sets gei arto oes ae C elicit A; , which is correct. Given 


i is (i bility 6 that all elements in O wil 
жүз oe events ee Г ; i E cem d both S; and С will be condi- 


be conditioned to О response and а 


312 PSYCHOMETRIKA 


tioned to 4; , and (ii) a probability 1 — 6 that the conditional status of the 
elements will remain unehanged. . . А 

(4) О — Т, — A; — not correct. А nonobserving response is made anc 
makes sets S; and C available; the sets S; and C elicit A; , which is incor- 
rect. Given this sequence there is (i) a probability 0 that all elements in © 
will be conditioned to O and all elements in both 8; and C will be condi- 
tioned to 4; other than the one which occurred on the trial, and (ii) a prob- 
ability 1 — 6 that the conditional status of the elements will remain un- 
changed. 

The above assumptions for conditioning and response probability are 
different from those postulated by Estes and Burke in their stimulus sam- 
pling model [5, 8]. No attempt will be made to compare the two sets of as- 
sumptions, but it should be noted that the ideas fundamental to the model 
presented in this paper initially were formalized within the framework of 
the Estes and Burke stimulus sampling theory. Unfortunately, the mathe- 
matical analysis resulted in a system of difference equations for which 
methods of solution are not known. Consequently, certain simplifying REP 
sumptions were made which yielded the present model. The difference 1n 
complexity between the stimulus sampling formulation and the present 


analysis is reflected in the state spaces of the respective stochastic processes. 
For the model presented in this paper the state space includes only six points 
(00W,,W;,,1—W,,1— W, 


1 ; 1}, while the state space for the stimulus 
sampling model is defined on the closed interval (0, 1]. 
Mathematical Formulation 


Given this conditioning process and the assumption that all stimulus 
elements in a particular set (©, S, , С, Sz) are conditioned to the same re- 
sponse at the start of the first trial, an organism ean be described as being 
in one of sixteen possible states or 


1 any trial. A state will be specified by an 
ordered four-tuple where: à 
(i) the first member of the tuple indicates whether all elements in set O are 

conditioned to O or 0; 

(ii) the second member indicates whether elements in S, are conditioned to 
A, or Ag; 

(11) the third member indicates whether elements in C are conditioned to 
A, or Agi 

(iv) the fourth member indicates whether elements in S, are conditioned tO 
A, or A. 


As an_example, the state (0, 1, 1, 2) indicates that i dic 

: = i. , 5 the © set is cond! 
tioned to O, S; is conditioned to A, , C is conditioned to A, , and S, is con- 
ditioned to A; . If the organism is in this state at the start of trial п, then if 
T, is presented an A, will occur, and if Т» is presented an A, will M with 


X 


5% 


M. 


RICHARD C. ATKINSON 313 


probability IF, and an A, with probability 1—17. . The states will be assigned 
identifying numbers as follows. 


14110 502,50 9. 0,1,1,1) 18. (0, 2, 1, 1) 
2. (0,1,1,2) 6. (0,2,1,2) 10. (0,1,1,2) м. (0, 2, 1, 2) 
3. (0,1,2,1)  7.0,2,2,1) 1 0,1,2,1) 15. (0, 2, 2, 1) 
4. (0,1,2,2) 8. (0,2,2,2) 12. (0,1,2,2) 16. (0, 2,9, 2) 


For these conditioning assumptions and the experimental parameters 
B, v, , and ra а transition matrix P describing the learning process can be 
derived and is presented in Table 1. To simplify notation, in writing the P 
matrix let a = 68m, , b = 08( т), c = 0(1 — B)r; , and d = 6(1 — 8) 
(1 — т). 

The state at the start of trial n is listed on the row, and the state at 
the start of trial n + 1 is listed on the column. For example, a, the entry 
in row 15, column 1, is the conditional probability of being in state (0, 1, 1, 1) 
at the start of trial n + 1 given that the organism was in state (0, 2, 2, 1) at 
the start of trial n. 

Let (n) be the expected probability of being in state û (¢ = 1 to 16) 
at the start of trial n, where the first experimental trial is n = 0. Define the 
row matrix 
(3) Un) = fu), tln), +° о e0]: 
one-stage transition matrix of order sixteen 
the conditional probability of being in state 


as in state î on trial n. Then the 
arning at the start of trial n 


Further, let P represent the 
presented above, where pi; iS 
j on trial n + 1, given that the system W 
Markov process describing discrimination le 
is 

U(n) = UMP". 
п of finite Markov processes see [4 11, or 12]. 


ses to learning see [2, 3, 18].) 
individual states of the 


(4) 


(For a general consideratio 
For applications of Markov proces ir to identify i 
Experimentally it is impossible to 1061 : 7 
process on a particular trial. That is, given qoem snaut the ше Т 
trial and the A; response which occurred, what state the iq was en 
the start of the trial cannot be specified. For хыл, Т is iar tho d 
and А» occurs, which of the sixteen staves Min dà p "or ibis mutila: 
бешт : unequivocally. с, 5 
үр. Pas be sube ing eight states would have been E A 
12 3 4 9. "n as 12. Obviously, this confounding Л leat ag 

$ Жу $ › Уу ? i not o id p 
that Ü anil 0 m have been postul agiles pen be placed in one- 
Since trial descriptions and theoretical stat 


р+а+е-т а P 9t 
P(TM-1) ч+®-1 2M PM 2@M-1) e S1 
2 1 = 2 TA - P(?M-T) dz et 
P M+Hq'M 0-1 о(®м-1) *(!A- 1) +а(1М-1) M^ ^ #1 
БЫЛ 240-1 (A-1) P a('M- 1) eM ЕТ 
q('A-1) P40-T тім q'A *(!A- 1) ə 21 
2(2M-T 
a('M-1) р(їм-ї) Ө-Т M+M q'A PA mow Г, IT 
PM e+0-1 ?(*A-) ч P(*A-1) FM от 
э+е+б-1 ч P 6 
ə е P+q+0-1 8 
P e 2+q+0-1 L 
э e р+а+6-1 9 
P e 24q040-T S 
q E р+е+б-1 + 
q P 24940-1 € 
q > P+e+0- 1 ? 
a P э+е+0-1 т 
91 St vt £t rat пт 01 6 8 L 9 S + € E 1 
EEELZIS se138 
("st19» xue[q Aq pa3e2rpur әле surioj 02ә7) 


d хтлуту{ uOHISUEIL our 


TaTavL 


RICHARD C. ATKINSON 315 


to-one correspondence, it is necessary (for an experimental evaluation of the 
theory) to define probabilities of events that are observable. Consequently, 
the following probabilities are of particular interest: p,(A1 | Tı), the expected 
conditional probability on trial n of A, given T, ; and p,(Ai | Ts), the ex- 
pected conditional probability on trial n of A, given T; . By inspection of 
the theoretical states it follows that 


pA, | Т) = wln) + us) + usr) + un) + us(n) + won) 


(5) 

+ Wika) + 23:00] + (1 — Wlan) + analn)], 
and 
(6) pil Ay | T.) = usn) + us(n) + us(n) + u;(n) + u(n) + wln) 


+ Wilun) + us(n)] + (1 = Wa) [шобт) + 14(n)]. 
Also, for analytical purposes, the probability of an observing response at 
the start of trial n will be useful. 


(7) pO) = wln) + wln) + +++ + usn). 


Analysis of the M odel and Some Special Cases 
the theory, without going into extensive 
mathematical detail, several special cases will be considered. These cases 
are of particular interest experimentally. For each case illustrative learning 
functions will be presented. The computations have been performed with 
the following restrictions on parameter values: Wi = W: = W; the initial 
probability of O was taken to be zero and the initial probability of A, to 
81, Sa , or C to be .5. That a = u(0) = ۰۰۰ = us(0) = 0 and ug (0) 
" E = ei eon at the Western Data Processing 
Center on a 650 IBM computer. The program or punch program deck is 
available to anyone who is interested in generating theoretical functions 
for cases or parameter values not presented in this paper. 'The program is 
arranged so that the following information must be read a the e ei 
memory: B, v, то, 0, Wi, W29 and the vector 000). The ee ud e 
pute U(n), р„(А‹ | T3); p. (4 | ا‎ c o m" values of n ап 
i of thes А d 
ll sen adea special cases, à general result can be im- 
totic probability of any state in 


mediately established; the Cesàro asymptou® : a 
the mie ays independent of the value of 0 when 0 > 0. This follows from 


ix terms of the form (1 — 6) 
in diagonal of the matrix P has 
ox, while ul other vem terms are of the form 6Y. For the case where 
› 


= 0, u;(n) = ш:(0) for all n. 


To illustrate certain aspects of 


316 PSYCHOMETRIKA 


Traditional Discrimination Learning 


The case in which rı = 1, ra = 0, and @ ¥ 0 or 1 describes the often 
investigated situation in which the subject is required to respond with А, 


to the presentation of T, and with A, to the presentation of T, . An inspection 
of the P matrix indicates that, for these particular parameter values, the pro- 


cess eventually will be absorbed in either state (0, 1, 2, 2) or state (0, 1, 1, 2) 
and, therefore, asymptotically 


p(0) > 1 
pA, | Т) 1 
P(A, | T) 5 0. 


(8) 


[9] 


15 20 25 30 


35 
BLOCKS OF 0 TRIALS 


А кон FIGURE 1 
Theoretical discrimination learning functi, 


e ] on for twi iti + — 
case in which z, = 1, m = O, and p ee pf stimulus similarity. The 


RICHARD C. ATKINSON 317 


Figure 1 presents several theoretical curves of p,(4 | Ti) in blocks of 
ten trials when @ = .05 and 8 = .5. The curves for p.(A, | Т.) are not pre- 
sented, since for 8 = .5 and the above initial conditions p,(A; | T) = 1 — 
pali | T). An interesting result is the relation between the functions for dif- 
ferent values of IV. Taking W = .8 as the comparison function, on early 
trials the Т = .2 curve is below the comparison curve. However, by ap- 
proximately trial 130 the JV = .2 curve crosses the comparison curve and 
remains above it as they both approach unity. This appears to be a general 
relationship for a fixed value of 0 and the above set of initial conditions; 
given W* > W** on early trials, the W* curve for pa(Ai | T) will fall above 
the W** curve, but at some trial a crossover will occur, and thereafter the 
W** curve will be above the WÈ curve as both approach unity. A proof of 
this result has not been obtained; however, calculations using many dif- 
ferent values of W and 0 in no case established a counter example. This is 
an interesting prediction, and one which should be verifiable. Unfortunately, 


no evidence has been found in the literature to confirm or negate this result. 
blem and is designed to manipulate 


Research is now under way on this problem anc cns 
W experimentally by varying the apparent. similarity between discriminanda. 


The Estes and Burke Study 

The ease in which т = 1.0, тг = 5, and B = .5 describes a form of 
discrimination learning investigated by Estes and Burke [9]. There are 
several aspects to the study, but for the present analysis only the acquisition 


process for the constant group will be considered. — | 
Facing the subject is а circular array of 12 lights, and either the onset 


of the six lights on the left half of the panel or the six on the right half are 

designated as a T, trial and the onset of the other six as a Ta trial. On each 

trial the subject makes either an A, or А, response; this is followed by a 
signal which informs the subject which response w вз corredi. 

In Figure 2 the observed conditional probabilities of p(A, | Tı) and 

Г n н 20 trials are presented. On the average, for а block of 

p(A, | Т) in blocks of trials. Consequently, in a block 


90 tri: ا‎ trials and 10 T» 
O ebate Чына eH Бели о f pld: | T,) for a given subject is based on 


of 20 trials the observed value 0 AU: ; 
approximately 10 observations and р(4: | T2) is also based on approximately 
10 observations. : 
Listed on the same graph are some theoretieal curves computed for 
0 = .05 and for W — 1, б, and .9. As can be seen, the W = .1 curves pro- 
05 а A, «6; alues for p.i | Tı) and р.(А | Тз). 


vide a fairly close fit to the observed v : 
For this pre of y ee pal Ar | T) curve approaches an asymptotic level 


of .907, which closely approximates the observed terminal values. More in- 


Н à , for W = 4. It 
teresting, however, is the theoretical dn s а trial 40, and 
starts out at .5, rises to a maximum vé 8 : : 


then monotonically decreases to an asymptotic level of .525. This initial 


PSYCHOMETRIKA 
318 


р(0) 


— rs ә = 1 


p(A, IT) 


6 = ر ا ت ت 
шай» Ө‏ —-— 


© OBSERVED 


3 4 5 6 7 8 9 lO п 12 
BLOCKS ОЕ 20 TRIALS 


Ficurr 2 
Observed values and 


theoretical functions of discrimination learnin, 
m = l,m = 1/2, апав = 1/ 


& for the case in which 
Theoretical results for th; 


ree conditions of stimulus similarity 


increase and Subsequent decrease in the p(A, | Т») curve is evident in the 
Estes and Burke data and is an observation they emphasize in their discus- 
sion of the results, 


illustrates S 


ome general theo- 
i namely, р.(4, | m 


1) is closer to unity 
and p.(4; | T3) is closer to Б 


e value of W., Another predic- 
tion of experimenta Interest is that the smaller t 
the maximum value of PAA, | 


T2), and also the 
be reached. To illustrate, for ihe above computations, the maximum of 
(4, | T;) for W = 4 Was .575 and occurred 9n about trial 40, while the 
maximum for W = 9 was .540 anı 


ae 


RICHARD C. ATKINSON 319 


The Popper and Atkinson Study 


The final study to be considered used five groups [14]. For all groups 
Tı = .85 and 8 = .50. The groups differed with respect to the т» parameter 
which took on the values .85, .70, .50, .30, and .15 for Groups I to V, re- 
spectively. In Figure 3 the observed proportions of A, responses following 
both T, and T stimuli for the last 120 trials are presented. The experiment 
was run for a total of 320 trials. An inspection of the response curves by 
trials indicated that a stable level of responding had been reached during 
the last 120 trials. Consequently, the proportions presented in Figure 3 can 
be used as estimates of р„(А, | T;) and p.(A; | T2). 

In fitting this data, the observed asymptotic value of р. (А, | T1) A 
Group IV was used to evaluate W. The resulting estimate was W = 
Using this value, predictions were then generated of p.(A, | T,) for ix 
other four groups and of р..(4, | Т») for all five groups. 


ө p&4(A, IT) 


Pe (A, IT;) 


X PolA, IT) 


I ш e E 
EXPERIMENTAL GROUPS 
FIGURE 3 


iscrimination problems involving five 
d ob totic values for discrimina 
b *-C арво schedules of reinforcement. 


320 PSYCHOMETRIKA 


The predicted values are given in Figure 3. No attempt will We m 
to present a detailed analysis of the data except to note that n = od 
trends are approximated by the model. In particular, the model pre P 
convex relation between p.(4, | Tı) and the value of r, which is reflec E 
in the data. That is, for a fixed value of Tı , Po(A, | Ti) first decreases m | 
then increases as т» goes from .85 to .15. Оп the other hand, the fence 
values of p.(A, | T.) for W = .1 are close to the т, value for all groups. 


Discussion 


No rigorous attempt has been made to test the model for the special 
cases considered. Nevertheless, qualitatively it appears that the | oe 
accounts for some aspects of traditional types of diserimination learning an 
can be extended without modification to 
probabilistic reinforcement schedules. 


Several studies currently in progress are designed specifically to = 
various features of the theory. The variables analyzed encompass a broa 
range of reinforcement schedules and include procedures designed to ma- 
nipulate the index of similarity between stimuli. It is hoped that these m- 
vestigations wilb provide a quantitative evaluation of the present theory 
and will lead to a more satisfactory formalization of discrimination learning. 

For readers familiar with the theoretical work of Burke and Estes Б 
апа Bush and Mosteller [6] for discrimination learning, it may be helpfu 


iod in 
s and the one presented 1 


"dil E involving 
discrimination problems involving 


ar interest are the asymptotic predictions generate 
by each model. 
In the Bush and Mosteller model, Ш 
(9) PAA, | T) 5 ту; 
Dal Ay | T) > 72, 


independent of the value of B. 
For Burke and Estes 


(10) PAA, | T) 5nW, + (1 Wy 


bh | Ta) > We + (1 wa, 
where т. = br, + (1 — B)z,. 


For the model presented in this paper, predicted asymptotes always lie 
between the predictions of Bush-Mosteller and Burke-Estes. That 1% 
р„(А‹ | Tj) is bounded between т and m W, + [ls Him, while 
pa(4Aı | Tz) is bounded between Ta and ть W, + (1 — Ware. ., 

These relationships are illustrated in Figure 3. Equation (9) predict 
that p.(A; | Tı) will fall on the straight line AB anq p=(A, | Т) will ae! 
on the straight line AC. In contrast, (10) predicts for W, = W, = 1 tha 


т. у 


RICHARD C. ATKINSON 321 


p=(A, | Ту) will fall on the straight line AD, while p.(A, | T2) will fall on 
the straight line AZ. As indicated earlier, for the present model the predicted 
value of p. (А, | 7) falls on the convex function bounded between the straight 
lines AB and AD, while pa(A, | Т») falls on the funetion bounded between 
lines AC and AE. Actually to compute predictions for the Burke-Estes model, 
one would estimate W from the data of one group, using an estimation pro- 
cedure appropriate to their model, and then predict the results of the other 
groups, as was done for the model presented in this paper. The results un- 
doubtedly would be different from those indicated by lines AD and AF in 
Figure 3, but would fall on straight lines with origins at А. 

In conclusion, it appears that this model generates some interesting 
predictions regarding both reinforcement, schedules and similarity between 
discriminanda. Objections might be raised concerning the particular assump- 
tions that were selected, but in the final analysis their evaluation will be 
determined in the laboratory. Nevertheless, several aspects of the theory 
leave the author uneasy; one feature is particularly disturbing and deserves 
comment. Reference is made to the assumption in which a single condi- 
tioning parameter @ is postulated. In essence, this assumption requires that 
the acquisition of O or © will progress at the same rate as the acquisition 
of A, or A; . Intuitively this seems an improbable state of affairs which may 
be approximated only for restricted experimental procedures. If this is the 
case, a change in the theory will be required such that two conditioning 
parameters are postulated, one governing the acquisition of O or O and the 
other A, or A, . This modification will still allow formalization of the model 
as a sixteen-state Markov process. However, the P matrix would have many 
more nonzero entries, and the theory would no longer yield asymptotic re- 
sponse probabilities which are independent of the conditioning parameters. 
These complications are not unmanageable, and if the modification proves 
necessary, theoretical predictions still can be generated easily. 


REFERENCES 
[1] Atkinson, R. C. A stochastic model for rote serial learning. Psychometrika, 1957, 22, 
son, R. C. A stocha 


87-90. Zach 
[2] Atkinson, В. C. and Suppes, Р. An analysis of a two-person game situation in terms of 
‚В. С.а д FE 
statistical learning theory. J. ёр: Psychol., 1958, 55, 369-378. ў ; 
[3] Atkinson, R. С. апа Suppes; P. An analysis of non-zero sum games in terms of a 
Markov miodal for learning. Tech. Rep. No. 9, Contract NR 171-034, Appl. Math. 
and Statist. Lab., Stanford Univ. 1957. ~ РТ | . 
[4] Bartlett, M s. dio ОШ. & stochastic processes. Cambridge: Cambridge Univ. 
‚ M. S. / 
Press, 1955. { 
[5] Burke, C. J. and Estes, W. К. А xp E e 
i i ika, 1957, 22, 133—129: "s ИОТ 
[6] one and, wee E y од! for stimulus generalization and discrimination. 
, RR: 1 „Е. 
Psychol. Rev., 1951, 58, 413-423. 
[7] Bush, В. В. and Mosteller, F. Stoc 


1 for stimulus variables in discrimina- 


hastic models for learning. New York: Wiley, 1955. 


322 PSYCHOMETRIRA 


[8] Estes, W. K. and Burke, C. J. A theory of stimulus variability in learning. Psychol. 
Rev., 1953, 60, 276-286. 
[9] Estes, W. К. and Burke, C. J. Application of a statistical model to sim 
learning in human subjects. J. ezp. Psychol., 1955, 50, 81-88. 5 
[10] Estes, W. K., Burke, C. J., Atkinson, R. C., and Frankmann, J. P. Probabilistic 
discrimination learning. J. ezp. Psychol., 1957, 54, 233-239. 
[11] Feller, W. An introduction to probability and its applications. New York: Wiley, 1950. 


[12] Fréchet, M. Recherches théoriques modernes sur le calcul des probabilités, Vol. 2, Paris: 
Gauthier-Villars, 1938. 


[13] Kemeny, J. G. and Snell, J. L. Markov processes in learning theory. Psychometrika, 
1957, 22, 221-230. : 

[14] Popper, J. and Atkinson, R. C. Discrimination learning in a verbal conditioning 
situation. J. exp. Psychol., 1958, 56, 21-25. 

[15] Restle, F. A theory of selective learn: 


ple discrimination 
e 


ing with probable reinforcements Psychol. Rev., 
1957, 64, 182-191. 
[16] Wyckoff, L. B., Jr. The role of observing responses in discrimination behavior. Psychol. 
Rev., 1952, 59, 437-442, 


Manuscript received 1 /10/58 
Revised manuscript received 5/30/58 


Ва: „а 


PSYCHOMETRIKA—VOL. 23, No. 4 
DECEMBER, 1958 


REMARKS ON THE TEST OF SIGNIFICANCE FOR THE 
METHOD OF PAIRED COMPARISONS* 


R. DARRELL Bock 
UNIVERSITY OF CHICAGOT 


е A three-component model for comparative judgment which allows for 
individual differences in preference is proposed. An implication of the model is 
that errors in the observed proportions due to sampling individuals in paired 
comparisons experiments are correlated. By neglecting this correlation, 
Mosteller's test for the method of paired comparisons tends to accept falsely 
the goodness of fit of the Case V solution. It is shown that bounds may be 
set for the correlation effect which make a valid test possible in some cases 
and provide useful standard errors for the estimated affective values. 


As a test of the goodness of fit of the model underlying a paired com- 
parisons solution, Thurstone [9] compared the observed proportions with 
those reconstructed from the solution. If discrepancies between the observed 
апа derived proportions were small in praetical terms, he considered the 
solution internally consistent. In 1951, Mosteller [7] suggested the use of 
the aresine transformation for proportions to test whether the variance of 
those discrepancies is in excess of that expected from the binomial sampling 
variability of the observed proportions. Unfortunately, when this test is 
applied to data from moderate sized samples, it persistently shows that the 
discrepancies are smaller than those expected from sampling variability. 
That is, the fit of the (Case V) model appears to be too good rather than 
too poor. Mosteller's example shows this effect, as do most of the results 
reported by Bliss [1], and in working with preference data the present author 


has encountered it repeatedly. А , 
It is shown in this paper that the anomalous behavior of Mosteller's 


test is the result of assuming the sampling errors independent when in general 
they are not. In brief, it is concluded that (i) under Case V assumptions, 


sampling errors for comparisons involving a common object are correlated 
and on certain assumptions, bounds for this correlation may be set, and (ii) 
with a minor alteration, Mosteller’s test may be recast in the ps of 
analysis of variance and the effect of correlation on the variance due to de- 
parture from the Case V solution may be derived. It is pm A ruta that 
the binomial sampling variance used as the error term by Mosteller is in 
Ў = ted in part by the Quartermaster Food and 
MM rational this paper bat oos. Tera and conclusions expressed herein аг) шр) 
: iews or endo: à ү 
of the author and do поб п се ers the ve substantially improved an earlier version 
of this paper, are gratefully acknowledged. .. 
Mop at the University of North Carolina. 
323 


324 PSYCHOMETRIKA 


general too large and that the test must frequently fail to detect departure 
from internal consistency in the Case V solution. 


A Three-Component Model for Comparative Judgment 


Each of N individuals is required to choose the object which he prefers 
in each of the possible pairs formed from n objects. Individual h thus urs 
n(n — 1)/2 comparisons in some order, and in the ith comparison, whic 
consists, say, of objects 7 and j, his choice is assumed to be determined by 


the momentary affective values that the objects have for him. These values 
have the composition 


(1) Y, = ш tri F еи , 
Vase = hi nid и, 


Where u; , и; are components fixed for specific objects and common to all 
individuals. These components are responsible for the concordance in pref- 
erence among the N individuals. 

Ум › Vaj ате components peculiar to specific objects and individuals, and 
random over the sample of individuals. These components are responsible 
for the consistent differences among the preferences of different individuals. 
Their distribution due to sampling individuals is bivariate normal №(0, 0, 


f: , гора) for all û, j. Assume Thurstone's Case V model, as amended by 
Guttman [5] so that t; = {гапа р; = p for all objects. 

Finally, &;, and e, are error components which affect randomly the 
momentary judgments of each individual and result in lack of transitivity 
in the preferences (the circular triads of Kendall [6]). Over individuals they 
are assumed to be independently distributed as N (0, à) for all objects. 


Tt is supposed that object 7 is preferred to j when Y,,, > Y,,, . Alter- 
natively, define the difference 


(2) Aug Ya Yi. = nu; — ш F vri — Vri F Enit — 
so that ? is preferred to j when X kd 
dropped because subscripts for the 
identify the comparison. 

From the Case V assumptions, 


ЕХ) = Me шщ, 


Ehjt у 


> 0. The subscript ¢ on X,;; has a 
objects involved, i, j, are sufficient tO 


var (Xaa) = 2P — p) + 287, 
Also for two comparisons sharing a common object, say i, j, and û, k, 
us Qs; DO Са = Pi; — б + pj) 
= Г(1—, 


R. DARRELL BOCK 325 


while for two comparisons not sharing а common object, say, û, j, and k, l, 
соу (Хм; 5 Х,а) = Coa — pis — Du + Pri) 
= 0. 


Since judgments of different individuals are independent, the variances 
and covariances of the mean differences taken over samples of N individuals 
are 1/N times the variances and covariances for the single differences. Ac- 
cordingly, the correlation from one sample to another of mean differences 


involving a common object is 
2)1 = 9 
9 px = бп — p) + 26" 
The ratio of variance in differences contributed by lack of transitivity within 
individuals to that contributed by differences in preference which are en- 
countered in sampling individuals is defined as 
a 
r= Fa = ө). 


In terms of this definition (3) may be expressed as 


1 
px = 2-c2r 


hen individuals differ in their preferences but 


are perfectly transitive in their judgments, the correlations of the mean 


affective differences involving a common object take on their maximum 
value of 1/2. When the departure of the preferences of the individuals from 
perfect concordance is due only to intransitivity, the correlations take on 


their minimum value of zero. 


Relationship of the Observations to the Model 

The observations in the conventional paired comparisons experiment 

consist of the expressed preference of each individual for one of the objects 

in each of the n(n — 1)/2 pairs. For the pairs of objects 7, j, and û, k, the 

preferences of a single individual h may be represented by the formal variates 
, @ preferred to j) 


It may be observed that w 


0 (j preferred to a), 
1 (G preferred to k) 
lin = 
` 0 (k preferred to i). 
Б 1 i } ti 
For randomly chosen individuals, fii and t: may be considered stochastic 


326 PSYCHOMETRIKA 


variates, independent from individual to individual, with probability dis- 
tribution 
P(t;-12Pi, P(t; = 0) = (1 — Р), 
Plt = 1) = Pa, Plis = 0) = (1 = Pu), 
PI; = 1) and a = 1)} = Pisin. 


For a sample of N individuals, the observed proportions of preferences 
of i to j and 2 to k are 


pir Os ta)/N, 
Pir = os ЭД 


where the summation is understood to be over the N individuals. 
Since the individuals are selected independently, for samples of size N, 


E(pj) = Pi 
Elpa) = Pa 
var (р:) = Pall — P,)/N 
var (ра) = Pall — P/N 
cov (ps; ,pu) = (Рик — PaPa) /N 


(see [3], p. 192). The observations may, in fact, be regarded as coming from 
a bivariate binomial process with parameters N, P;; , Pix , and Р. For 
this distribution the above results are well known ([6), p. 133). 

'The Thurstone solutions for paired comparisons data assume that the 
expected proportions are connected with the mean affective difference X;; by 
the normal response law 


i P" a - " 
из SUE Lan exp(-y/2dy (® «X, < ә) 


where т = 280 — p) + 26 and X;; = B(X;;) = m — uj. Consequently, 
the normal deviate corresponding to the sample proportion, p,; , is taken as 
an estimate of X,;/y. 

For statistical purposes the use of normal deviates as estimates of the 
standardized mean affective differences is not convenient. The sampling 
variances of the deviates depend upon the population proportions, P;; , and 
cannot be assumed constant as required for a simple least squares solution 
and analysis of variance. This difficulty can be avoided by departing slightly 
from the Thurstone solution and assuming the angular response law 


(Xii+e) 
(4) Pe Í sin (fy) dy, (O < Ka +0 <a) 


x 


r "ое ле ттл ae otn 


pe ED RS 0 


R. DARRELL BOCK 327 


where c and f are location and scale constants which determine the mean 
and variance of the response distribution. For example, if X;,/y ranges be- 
tween + (3/4)z, and с = (3/4)r and f = 2/3, the mean of the response 
distribution is (3/4) and the variance is (9/16)z? — 9/2 = 1.0516--- . 

Since the estimates of affective value from the paired comparison solu- 
tion are unique only to a linear transformation, these constants may be 
incorporated into the transformation based on (4) and the angles 
(3) т = 2sin" Vp — 7/2, (—т/2< tu € 2/2) 
defined as estimates of the mean affective differences with arbitrary unit. 
The form (5) for the angular transformation is convenient because the angles 
center about zero and their sampling variance is nearly stable at 1/N. Except 
for the arbitrary unit, this transformation closely parallels the normal in 
the range from P = .05 to .95 [4]. If the paired comparisons solution is con- 
fined to proportions in or near this interval, the finite range of the 2;; will 
present no theoretical difficulties. Furthermore, the results of this paper will 
apply to a good degree of approximation to the Case V solution based on 
the normal response law. 

For two comparisons sharing a common object, moments of the asymp- 
totic distribution of the angles from samples of size N may be obtained by 
expanding the right-hand member of (5) in a Taylor's series about P;; . 
Neglecting terms in which (p;; — Pi; appears in degree higher than the 
first, and taking term by term expectations of the series, their squares, and 
the product of the two series, 


E(a;;) = 2sin VP,; — 1/2 
Elza) = 2sin^ УР. — 1/2 
(6) var (ж) = 1/N 
var (2) £& 1/N 
(Ри. — РР) Р 
N VP. — P.)Pa( — Pa) 
Whence, the sampling correlation of the observed angles is 
(Pun PR 
Pa 7 PL = Pj)Pa — Pa) 


According to the three-component model, P;;,;, will not in general equal 
Pi; Pi, since the comparisons share the common object 7. As a result, the 
Sampling correlation of the angles within rows and columns of the paired 
comparisons table will not vanish as required for а conventional analysis of 
variance. In order to obtain bounds for the magnitude of this correlation ef- 


соу (Ti; , а) = 


328 PSYCHOMETRIKA 


fect on Case V assumptions, it is of interest to study ру. a8 a function of 
the parameters of the three-component model. Let y = 1, so that the dis- 
tribution of X;; , X; is 


М(Х, Xa, 1, px). 


Then the marginal proportions P;; and P;, can be obtained by entering the 
table of the normal distribution with X;; and X, respectively. The joint 
proportion Р; can be obtained from Pearson’s table of the bivariate nor- 
mal distribution, entering with X;; , X; , and ру [8]. Values of p.,,,,, have 
been calculated for selected values of these parameters and are shown graph- 
ically in Figures 1 and 2. 


P Xij,ik 


FIGURE 1 
The correlation of the obseryed angles as a function of the mean affective differences. 
(Negative values of X ,; yield a similar figure reversed from right to left.) pa 


It is clear from Figure 1 that the sampling covariance of the observed 
angles where à common object is involved, unlike the variance, is not inde- 
pendent of the values assumed for the mean affective differences. If the 
correlation px for the three-component model is assumed constant at 1/2, 
[T reaches s maximum of 1/3 when X;; = Xa = 0, that is, when a 50 
per cent split is expected in the preferences for the two comparisons. Аз 
X, and Xa depart from one another, so that the preferences for the common 
object approach 100 per cent in one of the comparisons, the correlation for 
the observed angles falls to near zero. In Figure 2, pz:;,« is seen to decrease 
almost linearly with px from 1/3 to 0 when X;; and X;, are zero. For other 
values of the means the change of p,,, ,, With px is restricted and more curvi- 


linear. 


ЕЕН 


* 


OO‏ ا 


R. DARRELL BOCK 329 

Xj* Хе 0.0 

E «Ж, 
Xj*Xi* 1.0 

f Xj,ik 

Xj = Xy» 2.0 

d 

[^] 1 2 3 4 5 


FIGURE 2 
The correlation of the observed angles as a function of the correlation of the mean affective 
differences. 


In order to obtain a workable solution for the three-component model 
it is necessary to assume that p.,,,,, is constant and equal to, say p. , in all 
comparisons. In general, this is not а good assumption, even if px can be 
assumed constant, because when the mean affective values of the objects 
differ greatly there will be many comparisons for which рь, is reduced by 
extreme values for Ру; and P; . On the other hand, in precisely those cases 
where the full efficiency of a least squares solution is required and statistical 
significance is in question, i.e., when the objects differ little in affective value, 
the proportions of all comparisons will be near enough to 50 per cent for the 
assumption of constant p.,,;,;, to be reasonable. The simple solution апа 
sampling criteria which result from this assumption are therefore of practical 
interest. 


An Analysis of Variance for the Case V Solution 
Let. the observed angles (5) for a paired comparisons experiment be set 
out in the form of (7). 


Objects 
1 2 se Ж Sums 

1 0 Me Gin, | Bre 
Von Vo T = £g 
(7) Objects ta 0 à i 
n qoc —RGt.a 

n Tn Tne 0 Y. 

Sums | 4 Хә co Un 0 


330 PSYCHOMETRIKA 


Since the grand mean for (7) is exactly zero, consider the elements of (7) 
to have the composition 


(S) 2, = o; + В; + ез. 


That is, the row and column effects of (7) are considered additive. Then, 
letting 8; = —о: , the observational equation (8) is of the simple subtractive 
form specified by the model (2). Assuming the variance and correlation for 
the error term are constant and taking account of the skew symmetry of Cs 
for distinct Л, i, j, k 


E(e;;) = 0, 
BE) = ò, 
Е(є;ен) = — y 2 
(9) (us) = (e = 1/N) 
E(e;;eix) ix: [n 
E(ee) = =p, 


Elensese) = 0. 
Writing (8) in terms of sample estimates 
v; = а + b; + ei 


and minimizing the correlated error demonstrates that the correlation terms 
drop out, because p. is constant, and the normal equation (10) results. 


(10) = ru + È b; + na; = 0. 


Because of the skew symmetry of (7), b; = —a; ; then assuming p b, =0 
as in à conventional analysis of variance, p : 


a; = D tym. 
i 
Because of the equivalence of (2) and (8), these a; may be regarded as esti- 
mates of the affective values of the objects, up to а linear transformation 
This means that Thurstone’s Case V solution in terms of aresines is а 1 ast 
squares solution even under the assumption of constantly correlated i 
within rows and columns of the paired comparisons table. І erus 
For the analysis of variance associated with this soluti 

i tion tk ч - 
tions of the row, column, and residual sums of squares must uten S 
assuming the covariance structure for error given by (9). The sum of squa: i 5 
between rows of (7) taken about the mean for the table (which is um 
zero) is | 


^. 


Б чир, R————————— 0 —Á—— و و و‎ 


R. DARRELL BOCK 331 
58А = i zs. 

25 1 Ds (ne; + 2; ei) 

-i[Xsu 2а XO e) + EE l 


Since the diagonal entries in (7) are without error, the expected sum of 
squares is 


E(SSA) =n Dai + (n — Vo? + (n — 1)(% — 2)p,0°. 


Similarly, the expected sum of squares between columns, which is numerically 
equal to that between rows, is 


E(SSB) =n 2; в: + (n — Io? + (n — 1)(% — Dp. 


The expected sum of squares for the residual may be obtained by sub- 
traction or derived as follows. 


Since Ху Ti. = 0 and 354 x; = 0, 
SSR = 57 >; її, — У) rim У s/n. 
Then 


SSR 


2 25 (a; + 8; + e) 
= 1 1 
—- x (по; + x ei) = n p» (n8; + > ii) 


EIX4-lXQw-;xOw 


ll 


ESSR) = (n — 1)(n — Ye — 2(n — (n — ро". 


These results are collected in Table 1. 
It may be noted that 


var (a) = "53 [1 + (e — Delo; 
(11) cov (a; , aj) = -5 [1 + (n — 2)р.]о°; 


var (a; а) = 2 [1 + ( = Delo’. 


332 PSYCHOMETRIKA 
TABLE 1 
Analysis of Variance for the Case V Solution * 


(Assuming a three-component model and the arcsine 
transformation of the observed proportions) 


сө d.f. Sums of squares Expected sums of squares 
Between 1 SS is, á Е " à um 
objects Sw = REx, (SSA) = (n-1) [1-2 а, Је + nZaj 


Residual (n-1)(n-2)/2 SSR = SST - SSA E(SSR) = } (n-1)(n-2)(1-2p,) ot 


= = 2 ss 
Total n(n-1)/2 SST = EEX, ; [ot = ux] 
i<j 


* Based on values in the upper half of the paired comparisons table only. 


Discussion 


Except for the approximation of the normal response law by the angular, 
Mosteller’s test for the goodness of fit of the Case V solution is equivalent to 
a x’ test, on (n — 1)(n — 2)/2 degrees of freedom, calculated by dividing 
the residual sum of squares in Table 1 by o°. If р, is greater than zero, how- 
ever, the expectation of this x’ is not (n — 1)(n — 2)/2 but is diminished 
by a factor of (1 — 2p-). According to the results established previously, p. 
approaches 1/3 if the preferences of the individuals are completely transitive, 
the P;; approach 1/2, and N is large enough to justify the approximations 
in (6). Under these conditions the X for the test of goodness of fit would be 
signficantly small if the data conform to the model. The three-component, 
model is, therefore, able to account for the aberrant behavior of Mosteller's 
test. When the individuals are not transitive in their preferences or there are 
many extreme proportions of preference, the reduction of the residual x^ 
would be less apparent. 

Similarly, à x^ on (n. — 1) degrees of freedom for testing signficance of 
differences in the affective values of the objects, if calculated from SSA/o’, 
is augmented in expectation by a factor of [1 + (n — 2)p,]. Thus, when cor- 
related error is present and its effect ignored, the investigator will be more 
confident than is warranted about the significance of differences among and 
the stability of the estimated affective values. 

It has been shown by Walsh [10] that correlation of the observations can 
be considered to alter the effective variance of sampling error. Its effect can 
be nullified, however, by incorporating the altered variance in the sampling 
statistics derived from the error distribution. Thus, the correct x? statistics 


R. DARRELL BOCK 333 


for Table 1 are, 


(12) Xk = T- 9532 5 df. 2 (n — D — 2)/2, 


(13) xa d.f. = n— 1), 


~ TL + (r — 2p. 
for the residual and between-object variance respectively’. 

Since there appears to be no very satisfactory way of estimating p. 
or even assuring its constancy, the corrected x^'s and the estimated variances 
(11) eannot provide exact tests of significance or standard errors. However, 
bounding conditions which should be useful in some applications may be 
established. For example if p. is assumed to take on its maximum value, 1/3, 
a residual x^ with the correlation effect cancelled may be computed from (12). 
If this x^ is not significant, we can be confident that there is no evidence of 
departure from the model. Conversely, if p. is assumed equal to zero and the 
x" is significant, there is evidence of departure. For intermediate cases no 
conclusion ean be drawn. 

Similarly, if x? for between-object variation is significant when computed 
from (13) setting p. = 1/3, there is evidence of significant discrimination. 
Conversely, if this x? is not significant when p. = 0 is used, there is no evi- 
dence of significant discrimination among the objects. Intermediate results 
remain uncertain. 

For standard errors of the estimated affective values, it should suffice 
in many cases to obtain upper bounds by assuming the condition most un- 
favorable to discrimination among the objects, namely, p. = 1/3. Substi- 
tuting this value in (11) and taking square roots gives maximal standard 
errors for the estimated values and their differences. 

Finally, it should be pointed out that for paired comparisons obtained 
from the repeated judgments of the same individual and with the identity 
of the objects concealed, as, for example, in organoleptie testing, Pz could 
Probably be assumed to vanish. This assumes that there is no replication 
effect specific to particular objects. In this case, the expected sums of squares 
in Table 1 would reduce to their ordinary form and exact tests of the residual 
would be possible. Thus, paired comparisons 
, such as Bradley's [2], are valid when confined 
al, but would encounter the same 


апа between object variance 
methods in organoleptic testing э? 
to repeated judgments of the same individu 
difficulties as Mosteller's test if applied to group data. 
REFERENCES 

[1] Bliss, C. I., Greenwood, M. L., and White, E.S. А rankit analysis oe 
for measuring the effects of sprays on flavor. Biometrics, а A З "T 

[2] Bradley, R. A. and Terry, M. E. The rank analysis of incomplete block designs. 1. 
‚М.А. pete > " 5 324-345. 
method of paired comparisons. Biometrika, 1952, 39, 324-345 


334 PSYCHOMETRIKA 


[3] Cramer, H. Mathematical methods of statistics. Princeton: Princeton Univ. Press, 1951. 

[4] Finney, D. J. Statistical method in biological assay. London: Griffin, 1952. 

[5] Guttman, L. An approach for quantifying paired comparisons and rank order. Ann. 
math. Statist., 1946, 17, 144—163. 

[6] Kendall, M. С. The advanced theory of statistics. Vol. I. London: Griffin, 1948. 

[7] Mosteller, F. Remarks on the method of paired comparisons: III. A test of significance 
for paired comparisons when equal standard deviation and equal correlations are 
assumed. Psychometrika, 1951, 16, 207-218. 

[8] Pearson, К. Tables for statisticians and biometricians. Part II. London: Univ. London, 
1931. 

[9] Thurstone, L. L. The method of paired comparisons for social values. J. abnorm. 
soc. Psychol., 1927, 21, 384—400. 

[10] Walsh, J. E. Concerning the effect of intraclass correlation on certain significance 
tests. Ann. math. Statist., 1947, 18, 88-96. 


Manuscript received 5/22/57 
Revised manuscript received 3/11/58 


PSYCHOMETRIKA—VOL, 23, No. 4 
DECEMBER, 1958 


A COMPARISON OF THE PRECISION OF THREE 
EXPERIMENTAL DESIGNS EMPLOYING A 
CONCOMITANT VARIABLE 


LEONARD S. FELDT 
STATE UNIVERSITY OF IOWA 


Three techniques are commonly employed to capitalize on a concomi- 
tant variate and improve the precision of treatment comparisons: (1) strati- 
fication of the experimental саара and use of а factorial design, (2) analysis 
of covariance, and (3) analysis of variance of difference scores. The purpose of 
this paper is to compare the effectiveness of these alternatives in improving 
experimental precision, to identify the most precise design and the conditions 
under which its advantage holds, and to derive, in the case of the factorial 
approach, recommendations as to the optimal numbers of levels. 


In order to improve the precision of what would otherwise be a com- 
pletely randomized analysis of variance design, educational and psychological 
experimenters often consider designs which involve the use of a concomitant 
or control variable. Such a variable is usually defined by a characteristic of 
the subjects which more or less predetermines the general level of their cri- 
terion measure and is highly correlated with it. For example, test intelligence 
is frequently used as a control variable in educational experiments on teaching 
methods, since a high correlation is known to exist between intelligence test 
Scores and the scores on the achievement tests typically used as criterion 
Measures in such experiments. Three techniques are commonly used to 
improve the precision of the experimental design: two-factor analysis of 
variance with two or more observations per cell, analysis of covariance, and 
analysis of variance of difference scores. The purpose of this paper is to com- 
pare the effectiveness of these alternatives and, in the case of the factorial 
approach, to derive recommendations as to the optimal numbers of levels 
under various conditions. І 

In the factorial or treatments-by-levels approach, as it has been called 
by Lindquist [17], levels or intervals are defined along the scale of values of 
the control variable, and subjects within any level are assigned to treatments 
at random. In almost all cases in which the treatments are superimposed by 
the experimenter the subjects are assigned to the several treatments in the 


same proportions for the various levels in order to simplify the analysis. The 
little intrinsic interest in the main effects of the 


ually can be made that the subpopula- 
hin any treatment differ in their 
ted that there will be no inter- 
le and the treatments variable; 


experimenter has, as a rule, 

Control variable. The assumption us 

tions corresponding to the various levels wit 

mean criterion measure. It is also often expec 

action between the levels of the control variab 
335 


336 PSYCHOMETRIKA 


however, the absence of interaction cannot be assumed. On many oceasions 
a test for the presence of interaction constitutes a secondary purpose of the 
experiment. Because both the levels and treatments variables are assumed 
to be fixed, nonrandom effects, the design is generally set up with at least 
two observations per cell, to make available an estimate of error variance 
that does not include the levels and interaction effects. Thus the purpose of 
the levels factor in the two-factor analysis is primarily to stratify the samples 
assigned to the other treatment categories. To the extent of the relationship 
between the control and criterion measures in the experimental population, 
such stratification results in control of an important source of error variance 
and hence improves the precision of the experiment. 

Analysis of covariance provides a second alternative by which a poten- 
tial source of error variance may be controlled. In place of the stratification 
of the experimental samples to reduce random differences between treatment 
groups, regression equations are used to adjust criterion measure differences 
among subjects, to the extent that these differences are associated with con- 
trol measure differences. Before the technique may be applied some assump- 
tion must be made as to the nature of the relationship between the control 
and criterion variables and the homogeneity of this relationship from treat- 
ment to treatment. If these assumptions are fulfilled, any experiment designed 
to permit valid analysis as a completely randomized design will have a valid 
analysis via covariance techniques. 

The essential feature of the method of differences is the definition of the 
criterion. In the factorial and covarianee approaches the control variable 
score X and a criterion variable score Y are not combined; under the differ- 
ence method the criterion measure is defined as (X — Y) or (Y — X). These 
data are then analyzed as a completely randomized design. This technique 
is probably most frequently employed in cases where X and Y may be con- 
sidered parallel forms of a test. For example, in educational experimenta- 
tion, the X or control variable is often defined as a pretest administered 
before the initiation of the experimental treatments, and the Y yariable is 
defined by a final test administered after completion of the treatments. 
Rarely would an experimenter be inclined to use such a difference score un- 
less it made intuitive sense as a measure of change in performance or gain 
in skill. 

These three alternatives will be considered as they apply to the one- 
factor, completely randomized design involving ¢ treatments, with an inde- 
pendent random sample assigned to each treatment. The number of experi- 
mental subjects, N, will be divided equally among the ¢ treatments; that 15, 
N = т. Assume that these samples are drawn from t normally distributed 
populations with equal variances. A continuous control variable X is available 
which is linearly related to Y, the criterion measure. The population mean 
and variance of X and the population value of the correlation coefficient, P» 


LEONARD S. FELDT 337 


between X and Y, are assumed equal for all ¢ treatment populations. Finally, 
assume that homoscedasticity in Y obtains around the regression line of Y 
on X. The experimental situation here assumed is, therefore, one in which 
the conditions of the completely randomized design and covariance are fully 
satisfied. It is also typical of the situations in which the two-factor and dif- 


ference methods are employed. 


Related Research 


The advantages of analysis of covariance over analysis of variance, when 
all assumptions of both models are satisfied and a highly correlated control 
variable is available, have been emphasized by writers in many experimental 
fields. The statistical literature includes many empirical demonstrations of 
the reduction in error variance accomplished through statistical control of 
à pertinent variable. Considerably less attention has been paid to the com- 
parative efficiencies of other techniques which also capitalize on concomitant 
information. 

Fisher [11, 12] develops and illustrates the three techniques under con- 


sideration here, but does not compare them rigorously with respect to pre- 
8, 16, 21] follow much the same pattern. 


cision. Other popular texts [1, 5, m 
limited to a 


Discussion of the efficiency of covariance designs is generally 
comparison with the corresponding analysis of variance. | | 

Federer [8] raises the problem of covariance versus stratification and 
Suggests various experimental situations in which both techniques might be 
useful. While he makes no systematic study of their comparative precision, 
Federer clearly favors stratification. He suggests the following rule to experi- 
menters: “If the experimental variation cannot be controlled by stratifica- 
tion, then measure related variates and use covariance (18), рр. 483-184). 
This view appears to be shared by most of the workers in the field. The 
opinion is based on a number of considerations. (i) It is generally accepted 


that the differences in the precision of the various designs are relatively small, 
even for moderate numbers of degrees of freedom. (ii) The greater number of 
f covariance renders the technique 


assumptions required for a valid analysis 0 
} wr orced to make the 


less generall i iii) The experimenter is often f 
ally applicable. (ii) Th ы о 
crucial choice of a regression model on rather tenuous bases. (iv) The failure 


of the data to meet the assumptions of the model is thought to be more 
serious in covariance than in regular analysis of variance, especially with re- 
Spect to failure to satisfy regression assumptions. (v) The m poc ees 
the effects which are eliminated may actually be relevant to the objectives 


of the trea . . 

isum xd Rutherford [20] give empirical piece T m 
that when the number of replicates per treatment is = oxima : E p ч 
the number of treatments, а modified Latin square uro эр ee 
than a covariance analysis which takes into account all possi g 


338 PSYCHOMETRIKA 


regressions. Essentially, they conclude that in their case stratification on two 
variables results in à more precise experiment than stratification on one and 
statistical control on the other. . 

Lucas [18] has considered covariance designs in which treatment groups 
were balanced with respect to the mean value of the control variate. Essential 
to his discussion is the following expression for the expected value of the 
adjusted mean square for treatments in a covariance analysis. 


7 ЯЙ = 22 Тхх py (и; = и)? 
Е [8] = с ei @— а. || mpi |. 


In this expression, which assumes fixed treatment effects, 


ms,* = adjusted mean square for treatments, 
n = number of replicates per treatment, 
i = number of treatments, 
Txx = sum of squares between treatments on X, 
Sxx = total sum of squares on X, 
> (и; — и)? = sum of squared deviations of the treatment population 
means from the mean for all treatment populations. 


It is assumed that the treatment groups are random samples from a common 
population on X. Lucus suggests that the term Txx/[(¢ — 1)Sxx] may be used 
as a measure of the loss in sensitivity due to failure to achieve perfect. bal- 
ancing on X. He then indicates that for experiments involving small numbers 
of degrees of freedom, a slight gain in efficiency may be obtained by a sampling 
procedure which achieves balancing. These conclusions were reiterated by 
Greenberg [15]. 

Gourlay [13] compared the techniques of stratification, covariance, and 
the analysis of variance of differences and concluded that covariance always 
results in the most precise experiment. However, Gourlay failed to consider the 
sampling error involved in the estimate of 8 and disregarded the effect of 
differences in degrees of freedom for error. Thus, in his discussion the error 
variance of a single adjusted mean under a covariance analysis was 
o2(1 — p’)/n, where oy is the variance of the criterion measure within any 
treatment population, p the correlation between criterion and control meas- 
ures, and n the number of replicates per treatment. Such a value applies only 
for the case in which the treatment sample mean on the control measure 
equals the general mean on that variable, a condition which does not gener- 
ally hold in strict random sampling. All comparisons with this value naturally 
indicated a difference in precision which favored covariance. Gourlay dis- 
missed the factorial or levels design without deriving any formal expression 
for its precision, since the error variance of a treatment mean clearly ap- 
proached ту(1 — р?) /т only as a limit. 

Cox [6] made an extensive study of various techniques for employing 


" ^ د 
y „== АЕ АЕС‏ ———— الا ا سه 


LEONARD S. FELDT 339 


concomitant information in an experimental design. He employed two meas- 
ures of experimental imprecision. The first, which he called the true impreci- 
sion of the experiment, was based on the population value of the average 
error variance for the difference between two treatment means, adjusted by 
covariance where appropriate. The second, which he called the apparent 
imprecision of the experiment, was defined as the product of the true im- 
precision times an adjustment factor based on the degrees of freedom for 
error. This adjustment was originally proposed by Fisher [12]. It makes 
possible a more meaningful comparison of the relative efficiency of two 
experiments which utilize the same total number of subjects but which give 


rise to unequal numbers of degrees of freedom for error. Symbolically, these 


measures are defined as follows. 


I, = average var (M; — M,)/(200/n); 


i (43 9, 

NEEL 
s the degrees of freedom for error, c, is the vari- 
ance of Y for fixed X; 2e5/n is ће theoretical minimum for the variance of 
the difference between two treatment means. Thus J, > I, > 1.00. 

Cox evaluated I, and I, for the covariance and factorial designs for a 
number of combinations of N, p, and t, the number of treatments. He con- 
cluded that stratification is more advantageous for p « .6 and that covariance 
becomes appreciably better than the block design only when p is as large as 
8 or more. He noted further that the block design is reasonably efficient for 
any form of smooth regression, not just for linear regression. However, if the 
distribution of X is leptokurtie, the efficiency of the block design is lowered 
due to the end blocks having units with widely discrepant values of X. . 

The block design in Cox's discussion is formulated by ranking subjects 
on X, subdividing the ranked subjects in groups of t each, and assigning one 
subject per block at random to each treatment. The interaction of blocks 
with treatments is then used as the error term. Such а design can rarely be 
used in psychological or educational experimentation, since an a priori as- 
sumption of no intrinsic interaction can rarely be made. In these fields the 
blocks are not generally selected randomly, and the vs effects and inter- 
action are regarded as fixed effects. The design js almost à ex ега УВ 
two or more subjects рег cell to make available an error estimate nasa; on 

ible a test of the signi- 


Within-cells variation. Such an estimate makes possible der 
ficance of the interaction, an effect which often has considerable experimental 


importance. The within-cells error estimate does not eliminate the problems 


; М :on 1 t, of course. As indicated 
of infer : * teraction 15 present, ; 
ference which arise when i action is equivalent to heterogeneity 


in the final section of this pape" any ertinent, however, to examine 
of regression in analysis of covariance. It is P ] 


І, 


In these expressions f i 


340 PSYCHOMETRIKA 


the accuracy of Cox's recommendations when applied to this type of block 
design and to extend them to experiments based on such numbers of cases 
as are commonly employed in educational and psychological research studies. 


Index of Experimental Precision 


Probably the most satisfactory index for the comparison of the precision 
of two experimental designs is one based on the average variance of the 
difference between all pairs of treatment means, adjusted by covariance 
where appropriate. The comparison of designs might be made in terms of 
the population value of these error variances. However, such а comparison 
fails to take into account variation from one design to another in the number 
of degrees of freedom available for error. Such differences reflect, variation in 
the precision of the estimate of the error variance itself, and they become of 
some importance when the degrees of freedom available for error are quite 
small. To permit an evaluation which makes due allowance for information 
lost in the estimation of the error variance, the adjustment proposed by 
Fisher and noted in the previous section will be employed. 

The minimum value for the variance of the difference between the 
means of treatments j and k is 


min var (M; — M,) = 2e;(1 — p)/n. 


Tn this expression c; is the variance of the criterion measure in any treatment 
population. This result follows immediately from the assumption of homo- 
scedasticity and the well-known expression for the variance of Y within any 
array for a given value of X. 

Following Cox [6], the true imprecision I, of a given experimental design 
is defined as the ratio of the population value of the average variance of 
(M; — M,) for that design to min var (M; — M;). The apparent imprecision 
I, is defined as the product of the true imprecision times the adjustment 
factor proposed by Fisher. That is, for a design designated 1, the true im- 
precision is defined 

ave var, (M; — M) 


ls = Gale ver, — M) 


The apparent imprecision of this design is defined 


2. - o (19). 


Thus for any pair of designs based on constant №, comparison of the respec- 
tive values of J. will indicate that design which will yield the most precise 


evaluation of the treatment effects. 


e 


LEONARD S. FELDT 341 


Comparison of the Precision of the Three Designs 


T wo-factor Analysis of Variance (Design 1) 


Under this setup several intervals, not necessarily of equal length, are 
defined along the scale of values of the control variable. All treatments are 
assigned subjects from the various levels in equal numbers. The general case 
in which h levels of the control variable are employed will be considered. The 
limits of the subpopulation corresponding to the lowest level are — © and 
X,. The general level i has the limits X;., and X;. The highest level sub- 
population has the lower limit X. and the upper limit of + ©. à 

In the two-factor design generated by the introduction of levels on X, 
all assumptions of the analysis of variance pertain to the distributions of Y 
in the subpopulations 1, ++, i, +++, h within each treatment. The error 
variance of this design represents an estimate of the variance of Y for these 
subpopulations, which, if the assumptions of the mathematical model are to 
be satisfied, must be equal. An expression for the variance of Y in the sub- 
population at level 7 will be derived; in general this variance is not exactly 
equal to that at all other levels. That is, under the assumed experimental 
Situation a small degree of heterogeneity of variance can be expected in the 
treatments-by-levels design. Expressions will then be derived for the average 


within-cell variances, Т, and Ја, and the latter will be evaluated for a variety 


of experimental conditions. | : 
From the assumption of homogeneous linear regression, the slope of the 


regression line of Y on X within any level of the population is equal to the 
slope of the regression line for the entire bivariate surface. Using lower case 
letters to represent values specific to level ¿ and upper case letters to repre- 
sent the entire population, this relationship may be written as 

oy _ gy. 
(1) PxY vx = Pry te 


r 5 1 setieity a second relationship involving 
From the assumption of homoscedasticity p g 


the variance of Y within any X array is 
(2) oil — È) = ofl — pi). 
and solving for c; , the following 


Solvi ; = hetituting this in (2) | 
Solving (1) for p,, , substituting de f ihe treatment poptiation 


expression for eriterion variance within lev 
is derived. 


(3) ER "E = s = ży]. 


asure for any level subpopula- 
f the control measure for this 
subpopulations differing in 


From (3) the variance of the criterion en 
tion is seen to be а function of the gari 
population. If the levels are defined to me 


342 PSYCHOMETRIKA 


the variance of the control measure, they will differ in criterion measure 
variance also. Thus if a linear relationship exists between X and Y, the as- 
sumption of homogeneous cell variance will not, in general, be exactly satis- 
fied for the two-factor or treatments-by-levels design. 

The degree of heterogeneity which obtains from level to level within 
any treatment may be demonstrated for two common experimental situa- 
tions in which the control variable is normally distributed: (i) the design in 
which the levels include equal proportions of the population, and (ii) the de- 
sign in which the levels are defined by equal intervals along the scale of values 
of the control variable. The variance of a segment of the normal distribution 
may be evaluated through integration by parts. For the unit normal distri- 
bution, а general formula for the variance of segment i is 


o, = 1 + nana 225 _ = a), 


where A is the area included in the segment, z;., and т; the segment limits, 
апа z;-, and z; the ordinates at these limits. This formula was used to solve 
for ої, for 2, 4, --- , 10 levels under both the equal proportion and equal 
interval definition of levels, and the resulting values substituted in (3). In 
the case of levels defined by equal intervals, the range from —3¢ to +30 
was used to establish interval limits; the lowest and highest intervals were 
then extended to — © and + ©, respectively. In Tables 1 and 2 these vari- 
ances are presented for each level, assuming c? = 1. Levels are numbered 
from lowest (number 1) to highest. 

It may be seen from (3) and the values in Tables 1 and 2 that the degree 
of heterogeneity depends on the value of p and the manner in which the 
levels are defined. Levels defined by equal intervals give rise to variances 
slightly less heterogeneous than those arising from levels defined by equal 
proportions. In both instances the degree of heterogeneity is quite small, 
even for values of p as large as .8. For example, if p — .8, and six levels are 
used, the ratio of the smallest variance to the largest is 1:1.32 when the 
equal proportion method of constituting levels is used, 1:1.05 when the equal 
interval method is used. In view of the literature on the effects of small 
degrees of heterogeneity of variance on the F test [3, 4, 7, 14, 19] this degree 
of heterogeneity will probably not seriously invalidate the test. 

The average variance over all levels has been computed for the various 
situations covered in Tables 1 and 2; these values are presented in the last 
line of each table. Represent this average as 


a @ = 4. — ‚(1 — #)|. 


Then write 
var (M; — Mj) = 24 /n, 


———À————————————— جه‎ HEN 


,QO0 TT — doló*-1 66-0 T6’ -T LEG =1 PLEI-T usan 
рәзц#тәм 
900° T-T 20806" -T ot 
2900°T-T Q6 -T 6 
2900°1-1 .08/6°-т 2906" -T 8 
2900°T-T ,QU6'-1 9S6 -T L 
z900 U-1 — ,doló"-1 4956-1 .9988`-Т 9 
dOOT-T  ,dol6'-1 .9156"-т  .glz6'-i 6 
2 z z ё 
г900°Т-Т „01/6°-Т .0956°-т „боёб6°-1 „00%9°-Т " 
doo" T-T el6'-1 d966°-T  .doz6'-1 _dS€g°-T £ 
g č г а z 
2900°1-1 0 6°-т 0856-1 L26- 95Е8°07т „904 97-т z 
а&%0'т-т &9®06'-т 2906'-т 20988"-T .29058°-Т z01£9'-1 ї 
= OT 8 9 h e i8qunu 
<тәләт JO ләшү Тәләт 
(00° = fo) uoTyeTndod TUUJON € Jo of + uoonjog 


зтелләзит Tenby Áq рәитјәй ST2A9] лој әоиетлед ayy, 


с 31871 


doo't-1 9656°- 9 616" - 0026 - = = 
K : T $ 646° -T E 026: -1 £198 m Сз T Чәй 
JOTI utet от 
gott 96-1 6 
JOT- .9266°-т :9818"-т 8 
gl 00°T-T 2966'-т 9 186'-т L 
goorti P 566-1 #066°-т 0961-1 9 
E отет а 666°-т 2 266°-T 29916` sy $ 
ovrt E66" -T 9266'-т 29586-1 208SL" -T q 
900°T-T 9266" -T 9066" -T 20586" -T 20896" -T £ 
HOTT gói 196-1 9916-1 29696" -T dito т € 
oot Әчө`-т geu- 2861-1 g85 i jte т 
——————————— Á— СН 
е от 8 9 " 2 iequnu 
rcc RP PN 
STAT Jo дәашпц тәләт 
ч 
(оо'т = 2° ) чотзетпаод pwuioN е jo зиотзлойола 


Tenby epnpoug UITUM sToAeT JOJ әопетләд әці 


3T8VI 


344 PSYCHOMETRIKA 


- п, 2 Lee = GU 


1= р 
апа 
_ (N= hh 3 
(6) J. = a(t ess Seat 


where Л equals the number of levels. 

From (5) and the values of Tables 1 and 2 it is clear that ,7, , and ulti- 
mately ,J, , depends upon the number of levels of X the experimenter em- 
ploys. As the number of levels increases, the variance of the control variable 
within levels decreases, and ;/, approaches 1.00. It is of some interest to 
note in these tables the relative rapidity with which the numerator of (5) 
approaches 1 — p°. 

It should not be inferred from (5) that the maximum experimental pre- 
cision is achieved when the maximum number of levels is used. It is true 
that, other things being equal, smaller values of ;7, indicate a smaller popu- 
lation variance for M; — M, . However, in a two-factor experiment, like 
that considered here, additional degrees of freedom are lost from error with 
every addition to the number of levels. This reduction in degrees of freedom 
represents a loss in experimental precision, a loss that may be either less than, 
equal to, or greater than the gain associated with the reduction in the popu- 
lation variance of M; — M,. For example, a loss of four degrees of freedom 
from 120 to 116 represents only a negligible loss of power and may be more 
than justified by the decrease in the error variance derived from one or two 
additional levels. On the other hand, a loss of four degrees of freedom from 
20 to 16 may result in a loss in power that exceeds the gain accruing from the 
reduction in error variance. Thus it can not be unconditionally concluded 
that the greater the number of levels, the greater the precision of the treat- 
ments-by-levels design. The optimal number of levels is contingent upon the 
total number of degrees of freedom available; it is for this purpose F'isher's 
adjustment for Г, is introduced. 

Since reductions in the population error variance decrease monotonic- 
ally with increasing numbers of levels, and since the effect of the adjustment 
factor (f + 3)/(f + 1) becomes more and more important with increasing 
numbers of levels, a point at which further increases in the number of levels 
is no longer justified exists. That is, for every experiment in which the null 
hypothesis is false, there must exist an optimal number of levels at which 
„Г, is a minimum. Fewer or more levels than this optimal number will result 
in less precision in the evaluation of the treatment effects. V. alues of ,J, were 
computed for t = 2, 5; N = 20, 30, 50, 70, 100, 150 to determine the number 
of levels at which this minimum is reached and the value of ,/, at this point. 

These data are presented in Tables 3 and 4. For comparative purposes the 


` sasayqusied 
u 
ur savadde ugTsop рәзтшоривл ÁK[o39lduoo лој т JO әптед 
* 


(өтөг)  (9t9g'2) (298'2) (66972)  (166'2)  (SeUt) 
WOT SII GETT "iz t 6ES'T 16° T $ 


(TBE) (её)  (gsg'z) (тсе)  (oL6'z) (020°) 
£q0'T £90°T £60'1 LETT бет 9g6t't [4 g 


(BS°T) (S661) (OT9'T) (от) (€89'T) (852°т) 
ono" Tt £90°T $60' t 6ET'T 9S2't een i S 


(EBST) — (w6G&'t) (809°T) (92z9'1) (0/9°т) (г/т) 
120°T THOT T90°T  ggo'i Ист quel г 9 


(Loz't) (тет) (Leet) (гқе"т) (гет) (6ЕЕ°т) 
gzo"T €40°T n90°T — бот 69T'T €9e°T 4 


(902'1) (ет)  (Sez'1) (6821) (т) (9TE*T) 
TEO'T 2E0'T LO’ T 1901 ITT git e w 


(9601) (өт) (£1o'1) (вот) (eeUt) (eL 


ого'т TEO'T оёт 690°T ET LTT S 
(940°t)  (£90'1)  (zLo'1) (бот) (ттт (TTT) 
Јло`т 920'1 150°т £S0°T 6g0°T ETT г g^ 
ost oot oL 06 ot ог 

3 d 


N 


yQUOTZTpueg Tw3uourpiadxg paqoatag 103 Бач 
jo sonq[mA :и9тѕәп рәтүшориву Ктәзәтӣшод рив 
(STA. JO ioqumy umurj3do чата) uSrTseq [91203294 


b g18Vl 


*5тәләт GZ чецз әлош UT" әтатѕѕой 4ueueAoidmy 4ugTlS z 


2 € Wn ъзцәшәлтпЬбәл aq} Aq pasodmy 4TupT " 


Ра: ot "m 45 x£ К $ 

P" Ez Lt Ma: wl 46 г 8 
qt 6 L S Е z 6 

xxS€ Lī £1 6 9 1 z 9 
от b S " t 2 $ 
Lī ET 6 9 " t z "v 
9 т t г z T S 
6 L S " t e e e 
OST от oL 06 ot ог 

4 d 
N 


uote [ndog әца Jo suorp3iodoaug Tenbg Ха 
peurjeq ѕТәләт Supunssy 'suop34Tpuo) Te3ueuriedxq 
рәзэәтәс̧ 10J ѕТәләт Jo iequny [9ur3do  :ugTseq T9T103294 


t 3I8VI 


346 PSYCHOMETRIKA 


| values of ,J, which hold for the completely randomized design have been 
included in parentheses in Table 4. For the completely randomized design 


EN E 
"ÁW-u-0i-35 


It may be noted in Table 3 that for several combinations of N, p, and 
t the optimal number of levels would not permit an equal, integral number 
of subjects per cell within a treatment. For example, with N — 20, p — .4, 
1 = 2, the use of three levels would require either that one level include an 
extra subject, or that a total of 18 rather than 20 subjects be used. In the 
preparation of Table 4 the possible necessity of removing subjects was 
ignored, and J, was computed from the full value of N. This procedure 
greatly facilitated the comparison of the precision of the various designs 
for selected values of N at the cost of only negligible error for a few of the 
selected conditions. 

Within the scope of the values considered, the data in Tables 3 and 4 
show several specific trends which experienced researchers may have already 
recognized. These relationships may be summarized as follows. 'The optimal 
number of levels tends to be larger for (i) larger values of p, (ii) larger numbers 
of experimental subjects, and (iii smaller numbers of treatments. Each of 
these trends is consistent with the recognized effects of the size of the correla- 
tion between the criterion and control variables and of reductions in degrees 
of freedom on experimental precision. The optimal numbers of levels pre- 
sented in Table 3 should serve as useful guides in the planning of two-factor 
experiments. Because changes in precision are relatively small as the number 
of levels approaches the optimal value, linear interpolations will yield suitably 
accurate estimates, except for interpolation along the scale of p. This inter- 
polation should be made in terms of p°. 


"n 


Analysis of Covariance (Design 2) 


The sample estimate of the variance of the difference between two 
adjusted means in an analysis of covariance is 


(7) var (Mj — М) = E + а). 

In this expression М; represents an adjusted treatment mean, ЕЁ, the 
adjusted mean square for error for the criterion measure, Eyy the mean 
square for error in X, and f, the degrees of freedom for error in X, Finney [10] 
has recommended that when many pairs of treatments are to be tested, an 
average variance for error may be used. It may be computed as follows 


QE yy 71298 
wa vant EAN ҮҮ XX 
(8) ave var (М i М Al == 1+ LES М 


—— a, 1 
ee ae 
9 ER < ct má umm cw 


LEONARD S. FELDT 347 


In this formula, T xx represents the mean square for treatments on the control 
variable. Assuming a normal distribution for X, the second term within 
the brackets is an F ratio divided by fe - Taking the expected value of this 
expression first with respect to Y and then with respect to X, then 


921 = S 
(9) ave var (M; — М) = zo р) (1 + +3) 
The value „I, thus becomes 
10 = 1 _ N—íi-—1. 
09 21. 1+7 9 N-i-2 


Since the degrees of freedom for error in the analysis of adjusted scores 
isN—t-—1, 


у -i-IyN i3) 

11 = (2 | 

ay al (x=) wae J” 
TABLE 5 


Analysis of Covariance: Values of pI, for 


Selected Exper imental Conditions 


t 

20 30 50 то 100 150 
2 1.181 1.113 1.063 1.0h5 1.031 1.020 
5 1.221 1.127 1.068 1.07 1.032 1.021 


Values of J, for analyses of covariance have been tabulated in Table 5. 
From a comparison of these values with those 1n Table 4 note that for p < 4 
the factorial approach results in approximately equal or greater precision 
than covariance; fot p 2 -Ê the advantage 15 Ш favor of covariance. For 
relatively high values of р and relatively small values of N bs pr in 
precision is appreciable. This difference © ا کے‎ 

i T ег 2 
that relatively small values eet fully the value of the control 


а sufficiently large number of й 
variable. However, the marked superiority of covariance occurs for values 


of p which are rarely encountered in educational and руны T аыр 
neither 
ments, It may also be noted ues 0 


that for p < 2 and small và ‘ 
i recision than 
covariance nor the factorial desig? p 


yields appreciably greater 
à completely randomized design. 


348 PSYCHOMETRIKA 


Differences (Design 3) 

The method of differences consists of a completely randomized analysis 
applied to the measure (Y — X). The variance of this measure is 
(12) var (X — Y) = о + oy — 2poxoy . 


Since the use of difference measures is generally restricted to instances in 


which X and Y may be regarded as parallel test forms or replicated measure- 
ments, assume ex = су. Thus 


(13) var (X — y) = Iga == 2рсї 
= 2071 — p), 
апа 
(14) var (M; — M,) = 402(1 — p)/n. 
Thus 
2 
(18) lr? 
and 
Я _(_2 үм- 1+ 3) 
on EDAD 
TABLE 6 


Analysis of Variance of Differences: Values of 314 
for Selected Experimental Conditions 


p t 
20 30 50 70 100 150 

ә: 2 1.842 1.782 1.735 1.715 1.700 1.689 
5 1.875 1.795 1.739 1.717 1.701 1.690 

E 2 1.579 1.527 1.487 1.470 1.457 1.5448 
5 1.607 1.538 1.491 1.472 1.458 1.548 

E 2 1.382 1.336 1.301 1.286 1.275 1.267 
5 1.406 1.346 1.304 1.288 1.276 1.267 

8 2 1.208 1.188 1.156 1.133 1.134 1.126 


5 1.250 1.196 1.159 1.145 1.134 1.126 


LEONARD S. FELDT 349 


Values of I. for this design are tabulated in Table 6. Comparison of 
these with corresponding values for the factorial and covariance designs 
clearly indieates the lower precision of the difference approach. It is also 
indieated that unless a substantial correlation exists between the control 
and criterion variables the difference approach results in considerably lower 
precision than that yielded by the completely randomized design. 


Discussion 


The measures of comparative precision derived above are an important 
consideration, but not the sole consideration in the choice of experimental 
design. Serious attention must be given to the effect of possible departure 
from the assumptions on which the methods are based, the importance of 
design simplicity in the communieation of results, and the extent to which 
valuable supplementary information may be derived from one or another 
of the designs. Several writers (5, 9, 11, 18] have pointed out that when two 
characteristics are to be controlled, blocking on one factor and covariance 
control on the other may be advantageous. This procedure, which is some- 
what equivalent to multiple covariance, can yield valuable supplementary 


information and still retain the simplicity of analysis of simple covariance. 


However, the values of J. derived by Cox strongly suggest that the combina- 


tion can not be defended on the basis of the precision of treatment comparisons. 
Most writers probably agree with Kempthorne (16), р. 139) that de- 
pendence upon the accuracy of the assumed regression model constitutes 
a severe restriction on the usefulness of covariance techniques. The absence 
of any regression assumptions in the levels design, on the other hand, repre- 
sents a considerable argument in its favor, especially in such instances as 
the number of degrees of freedom are fairly large and the difference in the 
precision of the designs is relatively insignificant. It would, in fact, seem 
justifiable to conclude from the data in Tables 4 and 5 that the less н 
assumptions of the factorial design more than compensate for the re ative ly 
obtain for covariance. This is 


small advantage in precision which may і a 
dis tno i ў umber 
especially true in edueational and psychological research in which the n 


of degrees of freedom for error is usually quite large and i ا‎ 
little is known about the form of the relationship between cri stie S 
control measures, This presumes, of course, that the experimenter W 


i ich i bly close to the optimal value. 
include a number of levels which is reason? i à 
The more general applicability of the factorial design о 
population to treatment population an be Fg Toe. 


i two-factor design. It 
between treatments and levels = indat to such an interaction, and the 


that heterogeneity of regression s 

Pues ither impli other. . : 

аг * Ep Чий йл. first note that the phenomenon of inter 
prove 5 ra 


350 PSYCHOMETRIKA 


action manifests itself by variation in the size of the treatments effects 
from at least one level of the control variable to one other. If, as before, 
Yi; represents the criterion mean of the subpopulation corresponding to 
level i and treatment j, the presence of interaction may be represented by 
the inequality 


ja = Ji. 7 ү Жым Tne 
or 
(17) Vu — 9м 7 Us — дһ, 


where the subscripts i and h refer to levels, j and v to treatments. 

If f;(z) and f,(a) represent the population regression functions, exclusive 
of any constant term, for treatments j and v, the four subpopulation means 
in these inequalities must satisfy the following equations. 


(8) y; = f(D) — а, tas = 0) — а ; 
Geo f(2)—23, Gis = fin = as « 


The constants a; and a, represent the effects of treatments jand v, respectively, 
plus any constant term in the regression equations. Subtracting the second 
member of each pair from the first yields 


di = fu = J€) = f(t), 
Io — Guo = f) — f). 


According to (17), however, the left sides of these equations are not equal, 
Therefore, 


(20) (ê) — 10) FCB) — f). 


This inequality can hold only if f;(x) is not identical to 1„(@®). Thus it has 
been shown that the presence of interaction in the levels design implies 
heterogeneous regression in the covariance analysis. 

To prove the converse, first note that, heterogeneous regression means, 
by definition, nonidentieal or nonparallel regression functions, That is, 
there exist at least two levels such that 


(21) f) — f(2) = 1:8) — f. (3). 


But since 7:; = f;(2;), it follows that 


(19) 


(22) Vii — Js Æ Gri — Tre - 


Equation (22) defines an interaction between treatments j and v and levels 
h and i. Thus heterogeneous regression implies the presence of interaction. 
]f other assumptions are satisfied and if the combined levels subpopula- 


EN DM I det Hmm 


LEONARD S. FELDT 351 


tions constitute a meaningful experimental population, a valid and meaningful 
test of main effects of treatments may still be made in the factorial design 
against a within-cells estimate of error variance, even though an interaction 
exists, On the other hand, heterogeneous regression renders the covariance 
technique, as it is typically applied in educational and psychological research, 
somewhat invalid. The extent of this lack of validity has not been extensively 
investigated. If the usual covariance model is used, the effects would appear 
to be more serious than those of non-normality and heterogeneity of variance 
are to an analysis of variance. In cases of heterogeneity of regression, the 
obtained error variance would probably overestimate the true error variance, 
obability of retaining a false null hypothesis. Further 
assumptions is no more serious than 
However, no such conclusion 


and thus increase the pr 
research may indicate such violation of 
heterogeneity of variance or non-normality. 
seems warranted at this time. 

There are available more general covariance models than that typically 
and psychological research. These models may be 
applied to many cases of nonlinear and heterogeneous regression and allow 
a valid test of treatments effects, so long as the mathematical form of the 
regression equations may be specified on a priori grounds. The phenomenon 
of interaction, as manifested in the levels design, does not, therefore, entirely 
preclude a. valid analysis by covariance techniques. However, the necessity 
of knowing the appropriate regression model represents a real restriction 
on the general applicability of covariance techniques. In most cases little is 
known about the fundamental nature of the measures used in psychological 
and educational research to make gne саси but the most tenuous 
assumption concerning the form of the relationship. MR 

Then, ho. dus ated experimenter in psychology = cay ons is 
not, as a. rule, familiar with the more general T oar 0 = 1 е 
not illustrated or diseussed in statistical texts intended vad с : pO ме 
fields. Although heterogeneity of regression is equivalen ie in d ink 
between treatments and levels, few experimenters are accus 


interpreti communieatin 
of interaction in these terms. The process of interpreting and g 


“peri ; ore difficult. . 
gir ye aig the control variable are available 


Thus in such cases as the data on à 
before the experiment is initiated, the moss peqem ahi dre 
mes A рс чс pcm лан а rs ex егіні in which 
technique, on the other hand, might be reserved Jor € p 
Stratification is not feasible. . ер 

miren be noted that s me Se " variance from treatment 
design will often be accompanied by cet gl : le, the interaction results 
to treatment within ihe same gl Ue n ка in p for the various 
from heterogeneous linear regression due to ie me entend. did 
treatments, the variance in the population at le 


employed in educational 


n the treatments-by-levels 


352 PSYCHOMETRIKA 


not equal that at level ¿ and treatment v. This may be seen from (3), which, 
with varying values of p, might be written 


Varying values of p; , coupled with constant values of a, , would result 
in some heterogeneity among the subpopulation variances at level i. 

This degree of heterogeneity of variance would be only slightly more 
serious than that which has been demonstrated from level to level within 
any treatment. For example, if p; = .6 and p, = .3, the population at the 
third level in a six-level experiment such as that described in previous ex- 
amples would have a variance of .645о° in treatment j, „911% in treatment v. 
The ratio is less than 2 to 1, however, and this degree of heterogeneity would 
not, according to the findings of the investigators referred to earlier, seriously 
affect the validity of the F test. On the other hand, the effect of this degree 
of heterogeneity of regression on the validity of an analysis of covariance 
may well be more serious, 


REFERENCES 

Ш Anderson, R. L. and Bancroft, T. A. Statistical theory in research. New York: McGraw- 
Hill, 1952. 

[2] Bartlett, M. S. The use of transformations. Biometrics, 1947, 3, 39-52. 

3] Box, С. E. P. Some theorems on quadratic form applied in the study of analysis of 
variance problems, I. Effects of inequality of variance in the one-way classification. 
Ann. math. Statist., 1954, 25, 290-302. 

4] Cochran, W. G. Some consequences when the assumptions for analysis of variance аге 
not satisfied. Biometrics, 1947, 3, 22—28. 

5] Cochran, W. С. and Cox, С. M. Experimental designs. New York: Wiley, 1950. 

6] Cox, D. R. The use of a concomitant variable in selecting an experimental design. 
Biometrika, 1957, 44, 150-158. 

7] Eisenhart, C. The assumptions underlying the analysis of variance, Biomelrics, 1947, 
3, 1-21. 

8] Federer, W. T. Experimental design. New York: Macmillan, 1955. 

9] Federer, W. T. and Schlottfeldt, C. S. The use of covariance to control gradients in 
experiments. Biometrics, 1954, 10, 282-290. . 

[10] Finney, D. J. Standard errors of yields adjusted for regression on an independent 

measurement. Biometrics, 1946, 2, 53—55. 

[11] Fisher, R. A. Statistical methods for research workers. (12th ed.) London: Oliver and 

Boyd, 1952. 

[12] Fisher, R. A. The design of experiments. (5th ed.) London: Oliver and Boyd, 1949. 

[13] Gourlay, N. Covariance analysis and its applications in psychological research. Brit. 

J. statist. Psychol., 1953, 6, 25-34. 

[14] Graybill, F. Variance heterogeneity in a randomized block design, Biometrics, 1954, 

10, 516-520. | 

Greenberg, В. С. Use of covariance and balancing in analytical surveys. Amer. J. 

publ. Hlth, 1953, 43, 692-9. : | 

Kempthorne, О. The design and analysis of experiments. New York: Wiley, 1952. 


LEONARD S. FELDT 353 


[17] Lindquist, E. F. Design and analysis of erperiments in psychology and education. 
Boston: Houghton Mifflin, 1953. 

[18] Lueas, H. L. Design and anal of feeding experiments with milking dairy cattle. 
Raleigh, N. C.: Inst. Statist. Mimeo. Series No. 18, Univ. North Carolina, 1951. 

[19] Norton, D. W. An empirical investigation of some effects of nonnormality and het- 
erogeneity on the F-distribution. Unpublished doctoral dissertation, State Univ. Iowa, 
1952. 

[20] Outhwaite, A. D. and Rutherford, А. Covariance analysis as an alternative to strati- 
fication in the control of gradients. Biometrics, 1955, 11, 431-440. 

[21] Snedecor, G. W. Statistical methods. (5th ed.) Ames, Iowa: Iowa State College Press, 
1950. 


Manuscript received 3/19/27 
Revised man uscript received 3/6/58 


— 


—— X — Л 
—M— ee ee С e ee 
" ho 


PSYCHOMETRIKA— VOL. 23, NO. 4 
DECEMBER, 1958 


AN AXIOMATIC FORMULATION AND GENERALIZATION 
OF SUCCESSIVE INTERVALS SCALING* 


Ernest ADAMS 
UNIVERSITY OF CALIFORNIA, BERKELEY 
AND 
SAMUEL MESSICK 
EDUCATIONAL TESTING SERVICE 


A formal set of axioms is presented for the method of successive intervals, 
aling assumptions are derived. 


and directly testable consequences of the sca 
the scaling model is gener- 


Then by a systematic modification of basic axioms з 
alized to non-normal stimulus distributions of both specified and unspecified 


form. 


ls of successive intervals [7, 21] and paired 
severely criticized because of their dependence 
le assumption of normality. This objection 
vens [22], who insisted that the procedure 
hologieal measure to equalize seale units 
rick for climbing the hierarchy of scales. 
The rope in this case is the assumplion that in the sample of individuals 
tested the trait in question has a canonical distribution, (e.g., ‘normal’) 
++». There are those who believe that the psychologists who make assump- 
tions whose validity is beyond test are hoist with their own petard nels M 
Luce [13] has also viewed these models as part of an “extensive and unsightly 


literature which has been largely ignored by outsiders, who have correctly 

condemned the ad hoc nature of the assumptions.” Жа 

Gulliksen [11], on the other hand, has explicitly discussed the testability 
Н tive procedures for handling data 


of thes ested alterna 
se models and has sugg { the scaling theory were 


whick i ks. Empirical tests O 
ch do not satisfy the dies ts of the methods [e.g., 


also mentioned or implied in several other accoum 
8, 9, 12, 15, 21, 25] Criteria of goodness of fit have been presented [8, 18], 
) J . 


Which, if met by the data, would indicate satisfactory scaling within an 


acceptable error. Random errors and sampling ne a m o» 
tematic deviations from scaling assu hereby evalua y 


mptions, are t 
the authors were attending the 1957 Social Science 
on Applications 0 


ies i 1 Science. 
Research Council Su Institute s f Мастан in Social Science. 
| ee t by Stanford University under 


he research was orted in раг 
ne Group Psychology Branch, Office of ara) hors Es 

ouncil, and by Educational Testing Servici iting of the report and Dr. Had 
or his interest and encouragement throughout E i THE manuscript. 
Gulliksen for his helpful and instructive comments 0 


Thurstone’s scaling mode 
comparisons [17, 24] have been 
upon an apparently untestab 
was recently summarized by Ste 
of using the variability of a psy¢ 
“smacks of a kind of magic—a rope t 


"This paper was written while 


355 


356 PSYCHOMETRIKA 


over-all internal eonsistency checks. However, tests of the scaling assumptions, 
and in particular the normality hypothesis, have not yet been explicitly 
derived in terms of the necessary and sufficient conditions required to satisfy 
the model. Recently Rozeboom and Jones [20] and Mosteller [16] have 
investigated the sensitivity of successive intervals and paired comparisons, 
respectively, to a normality requirement, indieating that departures from 
normality in the data are not too disruptive of scale values with respect to 
goodness of fit, but direct empirical consequences of the assumptions of the 
model were not specified as such. 

The present axiomatic characterization of a well-established sealing model 
was attempted because of certain advantages which might accrue: (а) an 
ease of generalization that follows from a precise knowledge of formal prop- 
erties by systematically modifying axioms, and (b) an ease in making com- 
parisons between the properties of different models. The next section deals 
with the axioms for successive intervals and serves as the basis for the ensuing 
section, in which the model is generalized to non-normal stimulus distribu- 
tions. One outcome of the following formalization which should again be 
highlighted is that the assumption of normality has directly verifiable con- 
sequences and should not be characterized as an untestable supposition, 


Thurstone’s Successive Intervals Scaling Model 
The Experimental Method 


In the method of successive intervals subjects are presented with a 
set of n stimuli and asked to sort them into k ordered categories with respect 
to some attribute. The proportion of times f,; that a given stimulus s is 
placed in category ? is determined from the responses. If it is assumed that a 
category actually represents a certain interval of stimulus values for a subject, 
then the relative frequency with which a given stimulus is placed in a par- 
ticular category should represent the probability that the subject estimates 
the stimulus value to lie within the interval corresponding to the category, 
This probability is in turn simply the area under the distribution curve inside 
the interval. So far scale values for the end points of the intervals are unknown, 
but if the observed probabilities for a given stimulus are taken to represent 
areas under a normal curve, then seale values may be obtained for both the 
category boundaries and the stimulus. 

Scale values for interval boundaries are determined by this model, 
and interval widths are not assumed equal, as in the method of equal appearing 
intervals. Essentially equivalent procedures for obtaining successive intervals 
scale values have been presented by Saffir [21], Guilford [10], Mosier [15], 
Bishop [3], Attneave [2], Garner and Hake [9], Edwards [7], Burros [5], and 
Rimoldi [19]. The basic rationale of the method had been previously outlined 

by Thurstone in his absolute scaling of educational tests [23, 26]. Gulliksen 


" —— — 
/———————— € 


ERNEST ADAMS AND SAMUEL MESSICK 357 


[12], Diederich, Messick, and Tucker [6], and Bock [4] have described least 
square solutions for successive intervals, and Rozeboom and Jones [20] 
presented a derivation for scale values which utilized weights to minimize 
sampling errors. Most of these papers contain the notion that the assumption 
of normality ean be checked by considering more than one stimulus. Although 
one distribution of relative frequencies ean always be converted to a normal 
curve, it is by no means always possible to normalize simultaneously all 
of the stimulus distributions, allowing unequal means and variances, on 
the sime base line. The specification of exaet conditions under which this is 
possible will now be attempted. In all that follows, the problem of sampling 
fluctuations is largely ignored, and the model is presented for the errorless 


case, 
The Formal Model 


The set of stimuli, d 
limit upon the admissible nu 


enoted S, has elements 7, 8, U, 0, +++. There is no 
mber of stimuli, although for the purpose of 
testing the model, S must have at least two members, For each stimulus 
s in S, and each category i = 1, 2, 77^» k, the relative frequency f,,; with 

i is given. Formally f is a function 


which stimulus s is placed in category 
2, ... , k} to the real numbers. More 


from the Cartesian product of S X 11, 2, 10 
Specifically, it will be the case that for cach s in S, f. will be a probability 


distribution over the set {1, 2, °°" » ky}. For the sake of an explicit statement 
of the assumptions of the model, this fact will appear аз an axiom, although 
it must be satisfied by virtue of the method of determining the values of fa, . 


Axton 1. f is a function mapping 5 X NAM , hj into the real nisse 
such that for each s in 5, f, is a probability distribution over ib " hie, 
for each s in Sand i = 1, 57, ^ 0$ ls Ap Ул. ч " " del 

The set А and the function f constitute the observables of the model. 


Two more concepts which are not directly observed remain 0 be оца. 
The first of i à a of numbers 6 , си Mom 
"i the intervals corresponding to the kan ea rane 
Intervals are adjacent and that they cR setis of re 
it will simply be assumed that û 77 = 

numbers, 


ey lana 
categories. It is 
cover the entire re 
1-1) are an increasın, 


i are real numbers: and for 
Axtom 2, Interval boundaries ПРЕ ATS , 
$ 2:2, ae : (k a 1), MM < A 
Finally, the distribution corr 
Sented by a normal distribution func 


, i ach stimulus s in 8 is repre- 
sponding 10 each stin 
tion №, . 


i d istribution functions 
S into nor mal dis 


. 1 g 
Axiom 3. N is a function mappins 
Over the real line. 


Axioms 1-3 do not st 


te fully the mathematical properties required for 
ate full 


358 PSYCHOMETRIKA 


the set 5, the numbers і, , --- ‚ ta-n , and the functions N, . In the interests 
of completeness, these will be stated in the following Axiom 0, which for 
formal purposes should be referred to instead of Axioms 1-3. 


Axiom 0. S isa non-empty set. k is a positive integer. f is a function 
mapping S X (1, --- , X} into the closed interval [0, 1], such that for each s 
Дый, $95. Ды = 1. Богі = 1, ++- , (k 1), t; is a real number, and for 
T= 1, tte (6—19) S li. N is a function mapping S into the set of 
normal distribution functions over the real numbers, 

Axioms 2 and 3 state only the set-theoretical character of the elements 
t; and N, , and have no intuitive empirical content. The central hypothesis 
of the theory states the connection between the observed relative frequencies 
Ў. and the assumed underlying distributions М, . 


Axiom 4, (Fundamental hypothesis) For each s in § andi = 1,-.. , k, 


fu = i М.(о) da. 


(Note that if = 1, ta-n Is set equal to — S, апай = р, = o, 

Axioms 1-4 state the formal assumptions of the theory although, because 
the fundamental hypothesis (Axiom 4) involves the unobservables №, and 
t; , it is not directly testable in these terms. The question of testing the model 
will be discussed in the next section. Scale values for the stimuli have not 
yet been introduced. These are defined to be equal to the means of the distri- 


butions N, , and hence are easily derived. The function » will represent the 
scale values of the stimuli. 


DEFINITION 1. v is the function mapping S into the real numbers such 
that for each s in S, v, is the mean of М. je, 


© 


v, = [ aN (a) da. 


Testing the Model 


The model will be said to fit exactly if all of the testable con 
of Axioms 1-4 are verified. Testable Consequences of these axio: 
those consequences which are formulated solely in terms of the 
concepts S and f, or of concepts which are definabl 
If no further assumptions are made about an indepe 
tı , coo , tæ- and N, then the testable consequences are just those which 
follow about f and S from the assumption that there exist numbers 
tı , +7 , ta-n and functions N s Which satisfy Axioms 1-4, In this model, 
it is possible to give an exhaustive description of the testable consequences; 
hence this theory is axiomatizable in the sense that it is possible to formulate 
observable conditions which are necessary and sufficient to insure the existence 


Sequences 
ms will be 
observable 
e in terms of S and f: 
ndent determination of 


ERNEST ADAMS AND SAMUEL MESSICK 359 


of the numbers t; and functions N, . The derivation of these conditions will 
proceed by stages. 

Let р, be the cumulative distribution of the function J for stimulus s 
and interval 7, 


Derixirion 2. For each s in S and? = 1, ++: , k, 
t= dos. 
pel 


It follows from this definition and Axiom 4 that for each s in S and 
i=l, k, 


(1) p.m | N.o) da. 


Using the table for the cumulative distribution of the normal curve with 
Zero mean and unit variance, the numbers z,,; may be determined such that 


1 te узая k 
(2) Фуз = val. е da. 


(Note that for = k, z, will be infinite.) N. is a normal distribution function 
э> = N, fsi 


and must have the form: 


1 2 
(3) N.a) = SE exp | -z ( — v.) j 
. Equations (1), (2), and (3) 


Where o? is the variance of N. about Ив = Ue 
Yield the conclusion that for each s in S and ? 
@ am — os 
In (4) the numbers z.,; on the left are known ger genos = um 
i pu a , are unk: k 
Observed proportions f,,; , while ће SEE d a aa = ا‎ 
Suppose however that r is a fixed member " 1 ч da pue eo 
i knowns i 
Possible to s or all the unknowns dp Re 
v, and E ا‎ standard deviation of the fixed stim 
To J. wre 


Solutions are 


= 1, se yk, 


А war у=" ШЫ 
(5) — "EL d 


£i 7 | se S, and iz 
m с, = Je = 22) for 
"d == “= — аы), | TU. 
0, = о„| r,t 4 Zi | Е 
tion that the system of equations 
jon the 


The necessary and sufficient eec с, be determinable using (5), (6), 


have a solution, and hence that f; › ": * 


360 PSYCHOMETRIKA 


and (7), is that all z, ,, be linear functions of each other in the following sense. 
For all r and s in S, there exist real numbers a,,, and b,,, such that for each 
= SIL „й, 


(8) #4 = Op 25,7 T Dus 


The required numbers a, „апа b, „ exist if and only if for cach rand s, the ratio 
1 А Й 


(9) 


is independent of 7 апа ј. 

If constants a,,, and b,,, satisfying (8) exist, then they are related to 
the scale values v, and the standard deviations c, in a simple way. For each 
r, sin S, 


(10) a... = o,/0, , 
and 
(11) 0... = (0, — v)/o, . 


Clearly the arbitrary choice of the constants v, and с, in (5), (6), and (7) 
represents the arbitrary choice of origin and unit in the scale. Since scale 
values of t; and v, are uniquely determined once v, and 7, ате chosen, the 
scale values are unique up to a linear transformation ; Le., an interval scale 
of measurement has been determined. It should be noted that this model 
does not require equality of standard deviations (or what Thurstone has 
called discriminal dispersions [25]) but provides for their determination 
from the data by equation (6). This adds powerful flexibility in its possible 
applications. 

It remains only to make a remark about the necessary and sufficient 
condition which a set of observed relative frequencies fax must fulfill in 
order to satisfy the model. This necessary and sufficient condition is simply 
that the numbers z,,; , which are defined in terms of the observed relative 
frequencies, be linearly related as expressed in (8). This can be determined 
by seeing if the ratios computed from (9) are independent of i and j, or by 
evaluating for all s, т the linearity of the plots of 2,4 against z, . Hence 
for this model there is a simple decision procedure for determining whether 
or not a given set of errorless data fits. 

If z, ,; and z, ,; are found to be linearly related for all s, r in 5 ‚ the assump- 
tions of the scaling model are verified for that data. If the z's are not linearly 
related, then assumptions have been violated. For example, the normal eurve 
may not be an appropriate distribution function for the stimuli and some other 
function might yield а better fit (ef. 11, 12]. Or perhaps the responses cannot 
be summarized unidimensionally in terms of projections on the real line 
representing the attribute [11]. If the stimuli are actually distributed in a 


ERNEST ADAMS AND SAMUEL MESSICK 361 


multidimensional space, then judgments of projections on one of the attributes 
may be differentially distorted by the presence of variations in other dimen- 
sions, This does not mean that stimuli varying in several dimensions may 
not be sealed satisfactorily by the method of successive intervals, but rather 
that if the model does not fit, such distortion effects might be operating. 
A multidimensional scaling model [14] might prove more appropriate in 
such cases. 

In practice the set of points (Баа Bag) for FB) y (k — 1) will 
hever exactly fit the straight line of (8) but will fluctuate about it. It remains 
to be decided whether “this fluctuation represents systematic departure 
from the model or error variance. In the absence of a statistical test for 
linearity, the decision is not precise, although the linearity of the plots may 
still be evaluated, even if only by eye. One approach is to fit the obtained 
points to a straight line by the method of least squares and then evaluate 
the size of the obtained minimum error [4, б, 12]. In any event, the test of 
the model is exact in the errorless case, and the incorporation of a suitable 
sampling theory would provide decision criteria for direct experimental 


applications. 
A Generalization of the Successive Intervals Model 
The successive intervals model discussed in the po vp an 
| 1 'alizati те: in detai 
be generalized in a number of ways. One Lancia pi. жыка s y 
by Torgerson [27], considers each interval bommaayi e е юз 
j i 1 itiv "ark е. P2 E n H 
Subjective distributi ath positive variance. ich tow 
jeetive. distribution W sida the requirement of normal distributions 
"iiio this generalization amounts to enlarging 
functions. Instead of specifying exactly 
llowed in the generalization, assume an 
1 line, to which it is required that 


generalizing the model is 
of stimulus scale values. Formally, 
the class of admissible distribution 
Which distribution functions are à 


ты. wee Т G а с nalizi ле the model, V is charaeterized 
the stimulus distributions belong. In formalizing 


real line. Axi may be 
i i vey the real line. Axiom 3 
simply as a set of distribution functio ov + » РТТ 
replaced by a new axiom specifying the sa а сый was nA 
that C is а function mapping 5 into elements of V; 


i s) is а member of y. 
иез лус езге و ا‎ А to be added: namely, 
One final assumption about 


"€ А nust contain all linear trans- 
Ry contains a distribution function С, 5 tà e distribution function C is 
formations of С. A linear transformation T 1 a ih Oe oneal from 
defined as any other distribution function Ta du аута adi А. 
C by a shift of origin and à scale tra 2 
Streteh along the horizontal axis must b 


sated for by а contraction 
ái a probability 
ransf 
9n the vertical axis in order that the E f 


ensity function, Algebraically; 2 rm 
orm. Let D and D' ре distribution func , 


ns 


362 PSYCHOMETRIKA 


of D if there exists a positive real number a and a real number b such that 
for all z, 


D'(x) = aD(ax + b). 


This is not truly a linear transformation because of multiplication by a on the 
ordinate, but for lack of a better term this phrase is used. The reason for 
requiring that the class у of distribution functions be closed under linear 
transformations is to insure that in any determination of stimulus scale 
values it will be possible to convert them by a linear transformation into 
another admissible set of scale values; i.e., the stimulus values obtained are 
to form an interval scale. If the set y is not closed under linear transformations, 
in general it will not be possible to alter the scale by an arbitrary linear 
transformation. 


Axiom 3'. y is a set of distribution functions over the real numbers, and 
C is a function mapping 5 into y. For all D in y, if a is a positive real number 
and b is а real number, then the function D' such that for all i 

D'(x) = aD(ax + 0) 
is a member of y. 

It is to be observed that the set of normal distributions has the required 
property of being closed under linear transformations. This set is in fact a 
minimal class of this type, in the sense that all normal distribution functions 
can be generated from a single normal distribution function by linear trans- 
formations. 

Finally, Axiom 4 is replaced by an obvious generalization which spe 
the connection between the observed f,,; , the distribution functi 
and the interval end points t; . 


cifies 
ons C, , 


Axiom 4’, For each s in S andi = 1, --- , k, 
Мы = | C(x) dx. 
tind 
(Here again tı = — c and i, = œ.) The stimulus values are defined as 


before to be the means of the distribution functions C, . 


DEFINITION 1’, v is the function mapping 5 into the real numbers such 
that for each s in S, v, is the mean of C, , ie., 


0, = | RC ах. 

The problem now is to specify the class of admissible distribution func- 
tions у. Each specification of this class amounts to a theory about the under- 
lying stimulus distributions. If the hypothesis of normality is altered or 


r X... SMS NN 2 А ЕЕ 


ERNEST ADAMS AND SAMUEL MESSICK 363 


weakened, what assumptions can replace it? Omitting any assumption about 
the form of the distribution funetions would amount to letting V be the set 
of all distribution functions over real numbers. If no assumption whatever 
is made about the forms of C, , then the theory is very weak. Every set of 
data will fit the theory, and the scale values of t; can be determined only 
on an ordinal scale. It is always possible to determine distribution functions 
C, satisfying Axiom 4' for arbitrarily specified t; . To show this it is only 
necessary to construct them in accordance with the following definition. 


S. > ‚ Tem . 
فل‎ {=й E i=1,---,k 
Сда) = 4 — Бы, à i 4 


0 otherwise. 


Non-normal Distributions of Specified Form 

It is clearly necessary to make some restrictions on V if the scale values 
are to be determined uniquely up to a linear transformation. It will next 
be shown that any minimal class of distribution funetions, in the sense of 
a class all of whose members are generated from a single member by linear 
transformations, has the desired property of generating a linear scale of 
Stimulus values when the model fits. For the present assume that y is a 


minimal class of distribution functions. 


a distribution function D such that for all 


ASSUMPTION 1. There exists m 
: : a positive real number a and a 


distribution functions D’ in у there exists 
real number b such that for all 2, 

D'(3) = aD(ax + b). 
ion 1 is satisfied the scale values are obtained 
d as follows. Axiom 3’ and Assumption 1 imply 
] number a, and а real number 


To show that if Assumpt 
on an interval scale, we procee a 
that for all s in S, there exists a positive rea 
b, such that for all x, 
(12) сд) = „ах + 0), | 
ight side of (12) is a fixed function of some 
= all the functions D' in y. According to 


dials’, 


where the function D on the T 
Specified form linearly related to 
Axiom 4’, then, for each 8 in S, an 


i ; + b) dz. 
(13) £a = АВ a, D(a.x + i) g | 
esponding to D, and the cumulative 


If r is the cumulative distribution corr then 


distributions p,,; are defined as before, 

= [Loos + e 
(14) p. - 
n (a.t; + b.). 


\ 


364 PSYCHOMETRIKA 


Assuming that the function z is strietly monotone increasing, then, knowing 
the form of function D, it is possible to determine uniquely the numbers 
z,,; such that for each s in S andi = 1, --- , Rf, 


(15) Dai (2,2). 
Equations (14) and (15) imply immediately that 
(16) 2,5 = а! + b, 


for all s in 5 апа? = 1, -++ ‚К. It is clear from (15) why it is necessary 
to assume that r is strictly monotone increasing. If it were not, there would 
not in general be a unique z,,; determined by (15); hence the scale values 
based on 2,,; would not be unique. It is also seen that (4), relating z,,; to 


l: , v, and о, in the normal distribution model, is simply a particular case of 
(16) here. The connection between a, , b, and c, and v, is 


с, = l/a,, v, = —b,/a, . 


In (15), as in the corresponding set of equations obtained from the 
normality assumption, the numbers on the left are known, and the numbers 
on the right are unknown: As before, if two numbers a, and b, are arbitrarily 
determined for a fixed stimulus r, then the t; are uniquely determined by the 
following equation. 


(17) 1, = (era = b,)/a, , t=1,--+,k. 


The scale values for the stimuli, however, cannot be directly determined 
from the coefficients z,,; , a, and b, without first specifying the mean m 
of the basic distribution D. If m is the mean of D, then v, ; Which was defined 
as the mean of C, , is determined by 


(18) v, = (m — Ъ,)/а, . 


Both the a, and the b, in (17) can be determined in terms of 2,4,0, and b, , 
(19) and (20); hence v, is immediately determinable in terms of just these 
quantities by (18). 


Bri — f. 
(19) „= Baa” 
Zat > Es 
(20 b, = Zeri (= = e, b). 


It is clear then that the scale values t; and v, are determined up to a 
linear transformation. Furthermore, , necessary and sufficient conditions 
that a set of data fit the model are simply that the ratios of differences in 
гв on the right in (19) be independent of 7 and j; i.e., that the z's be linearly 


related. 


7 EEE 


j 


> 


ERNEST ADAMS AND SAMUEL MESSICK 365 


The Forms of the Distributions Unspecified 
T". ловна he hype is as in ar Assumption 1 
i.e., it is assumed that the underl 4 d is i bui malls ven а 
ах ns at nderlying distributions all belong to one minimal 
class, but that the elass can be generated by any distribution function D. 
Interestingly enough, in this ease it is still possible to test the model and 
to obtain more than ordinal information about the scale values. If it is 
assumed that the stimulus distributions all belong to one minimal family 
generated by a function D, but D is unknown, all of the deductions up through 
(14) go through, although in this case the function т is also unknown. Now, 
of course, it is impossible to discover the numbers г, by solving (15), but 
if it is postulated that the function т is strictly monotone increasing, it is 
sull possible to obtain some information about the numbers (a.i; + b.). 
Since z is a cumulative distribution it is monotone inereasing; however, it 
will only be strictly monotone increasing in case the distribution function 
D is never zero, This assumption is made explicit in Assumption 2. 


Asstmprion 2. For all x, D) > 0. 
Now, if r is strictly monotone increas 
if and only if x > y. If (14) holds, then it wi 
in Sand i,j = 1-7: 4, 


ig, then it follows that a(x) > т(у) 
ll be the case that for all r, s 


Голу aati + b, > arty +b 


one can obtain a system of 
ants a, , bo and t; . If it is further specified 
xs of the problem) that a, > 0 for all S, then 
al have a solution. 

a fits the model may still be deter- 
fit is that there exist numbers 


(21) Pai >р if anc 


Therefore from an ordering on the numbers p»; 


inequalities involving the const 
(as is required for the conditior 
this set of inequalities will not in gener 

However, whether or not à set of dat 


mined. The necessary and sufficient condition for j xis 
9, , t and b (where в. > @ satisfying the system of inequalities (21). If 


this set of inequalities has a solution, then the cm win Fo 
taken to be the ¢; satisfying (21). To determine the scale v ae 0 he : i 
it is first necessary to construct à distribution function which can represen 


1 aw. ifferentiz { in- 
the data. This is done in the followins way. A ene eid 
creasing function r(x) is constructed by connecting the discrete p 


m (a.i + b) = Psi 
ig curve. 1, 
an always 


i i as is usual, there is 

'easit 
notone increas hen 
then such a curve с be constructed. 


D is defined by 


With any smooth, strietly mot 
d а finite number of stimuli, 
4 n . а R B 

inally, the distribution function 


(22) pa) = r. 


366 PSYCHOMETRIKA 


Then, if thé mean of the distribution D is m, the values v, of the stimuli 
are determined by (18), v, = (m — b,)/a, . As far as the determination of 
the v, is concerned, it can be seen that they depend solely on the previously 
determined a and b and on the mean m, which can be regarded as an additional 
arbitrary constant in the determination of the v, . 

The remaining point of discussion for this model is the determination 
of the degree of uniqueness of the scale values. Finding the set of all possible 
solutions to the inequalities (21) presents, in general, extreme difficulty. 
One thing that can be simply determined is the class of what might be called 
the universal transformations of the solutions of the system of inequalities. 
A universal transformation is one which, applied to a solution of any set of 
inequalities, yields another solution to the same set of inequalities. By noting 
a close connection between the theory of the inequalities (21) and a two- 
dimensional affine geometry with a distinguished set of horizontal and vertical 
lines, it can be shown [1] that the class of universal transformations for this 
model is a subset of the affine transformations. The universal transformations 
of the interval boundaries ¢; are the linear ones, and of the a, are multiplica- 
tions by a positive constant. The b, also are determined up to a linear trans- 
formation, and hence so are the scale values v, (although the additional 
arbitrary constant m also enters into their determination). 

There is also an interesting special case in which, even though there is 
only a finite number of observations, the scale values of the t; are determined 
up to a linear transformation. This might be called the special case of equal 
intervals, in which differences in successive f; are all the same. If, for example, 
there exist stimuli with such relations among corresponding p’s as p,,, = 
Puisi = Deine y Paiti = Puise y Рун = Paisi , ete., it is possible to deter- 
mine that successive intervals are equal [1]. 

The fact that scale values obtained in this model, at least under certain 
circumstances, are unique up to a linear transformation has two interesting 
consequences for the original successive intervals model based on the nor- 
mality hypothesis. (i) If in the errorless case the original model fits, then 
no other successive intervals model which assumes a different form for the distri- 

bution functions will fit. The reason for this is that the forms of the distribution 
functions (or the cumulative distributions) are determined by the values of 
р, lying above the point t; . Hence, if the t; are determined up to linear 
transformation, во are the curves p,,; . (ii) Where the normality assumption 
does not fit the data it is theoretically possible to use the present generalization 
to obtain a scale. Then the deviation of the scale values from those obtained 
under a normality requirement can be evaluated. This, at least in principle, 
provides à second kind of goodness of fit besides the usual least squares 
regression methods employed where the data do not exactly fit the Thurstone 


model. 


A. 


а> 


ERNEST ADAMS AND SAMUEL MESSICK 367 


REFERENCES 


Adams, E. and Messick, S. An axiomatization of Thurstone's successive intervals 
and paired comparisons scaling models. Stanford, Calif.: Stanford Univ., Applied 
Mathematics and Statistics Laboratory, ONR Technical Report No. 12, 1957. 

2] Attneave, F. A method of graded dichotomies for the scaling of judgments. Psychol. 
Rev., 1949, 56, 334-340. 

3] Bishop, R. Points of neutrality in social attitudes of delinquents and non-delinquents. 
Psychometrika, 1940, 5, 35-45. 

4] Bock, R. D. Note on the least squares solution for the method of successive categories. 
Psychometrika, 1957, 22, 231-240. 

5] Burros, R. H. The estimation of the discriminal dispersion in the method of successive 
| intervals. Psychometrika, 1955, 20, 299-305. 

6] Diederich, G., Messick, S., and Tucker, L. R. A general least squares solution for 
9-173. 

hod of successive intervals. J. appl. 


E successive intervals. Psychometrika, 1957, 22, 15 
7] Edwards, A. L. The scaling of stimuli by the met! 


Psychol., 1952, 36, 118-122. 
8] Edwards, A. L. and Thurstone, L. L. An internal consistency check for scale values 


determined by the method of successive intervals. Psychometrika, 1952, 17, 169-180. 
[9] Garner, W. R. and Hake, H. W. The amount of information in absolute judgments. 
Psychol. Rev., 1951, 58, 446—459. 

10] Guilford, J. P. The computation of psychologic: 

categories. J. exp. Psychol., 1938, 22, 32-42. 

[11] Gulliksen, H. Paired comparisons and the logic o! 

53, 199-213. А 

12] Gulliksen, Н. A least squares solution for successive inter 

| dard deviations. Psychometrika, 1954, 19, 117-139. 

[13] Luce, R. D. A theory of individual choice behavior. Burea 

Univ., 1957. (Mimeo.) 

14] Messick, S. Some recent theoretical developm 

_ psychol. Measmt, 1956, 16, 82-100. ausis л. 194 

15] Mosier, C. I. A modification of the method of successive intervals. Psychometrika, 1940, 

‚5, 101-107. 

[16 Mosteller, F. Some miscellan 

of paired comparisons. Cam 

_ 10; Ch. III. 

[17] Mosteller, F. Remarks on the 
solution assuming equal standar 
1951, 16, 3-9. 

[18] Mosteller, F. Remarks on the m 
for paired comparisons when € 
assumed. Psychometrika, 1951, DE 

[19] Rimoldi, H. J. A. and Нота т < iamen 

ро] eal me тарый S ee f the successive intervals method of 

m, W. W. and Jones, ^ *- 
d 99 56, 21, 105-183. 
psychometric scaling. Psychometr ika, 1956, 21, = sat 

[21] Saffir, M. A mne Ative study of scales constructed by three psyc 
Psychometrils 2, 179-198. T . Stevens (Ed.) 

22] se = ye d dics measurement, and x p. M In S. S. Stevens ( , 

TAA. x г : Wile 
(23) Handbook of experimental pU ден holds d educational tests. J. educ. 
3] "Thurstone, L. L. A method of 8® ing 
„Ты d 
Psychol., 1925, 16, 433-451. 


al values from judgments in absolute 
f measurement. Psychol. Rev., 1946, 
vals assuming unequal stan- 
u Appl. Soc. Res., Columbia 


ents in multidimensional sealing. Educ. 


s to scale theory: Remarks on the method 


s contribution 
ee Univ., Lab. Soc. Relations, Report No. 


bridge: Harvard 
d comparisons: I. Тһе least squares 
d equal correlations. Psychometrika, 


method of paire 
d deviations an 


s: III. A test of significance 


7 — 
hod of paired compariso! à 
s: : and equal correlations are 


deviations 


udgment in the succes- 
ika, 1955, 20, 307-318. 


hophysical methods. 


368 PSYCHOMETRIKA 


[24] Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol., 1927, 38, 368-389. 
[25] Thurstone, L. L. A law of comparative judgment. Psychol. Itev., 1927, 34, 424-432. 
[26] Thurstone, L. L. The unit of measurement in education: 
1927, 18, 505-524. 
[27] Torgerson, W. S. A law of categorical judgment. In L. S. Clark (Ed.), Consumer 
behavior. New York: New York Univ. Press, 1954. 
Manuscript received 2/12/58 
Revised manuscript received 4/21/58 


al scales. J. educ. Psychol., 


1 


” 


-Á— 0 — E CUFSÀ ANLE > 


E cU ae a ӘӘ - –` т 


PSYCHOMETRIKA—VOL. 23, No. 4 
DECEMBER, 1958 


THE SINGLE LATIN SQUARE DESIGN IN PSYCHOLOGICAL 
RESEARCH 


Јонх Garro 
AIR CREW EQUIPMENT LABORATORY, PHILADELPHIAT 


The expeeted value of mean square concept is used to determine the ef- 
fects of the presence of interactions in the single Latin square design on F tests. 
The results indicate that as the number of random effects included in the ex- 
periment increase, more F tests are unbiased, and that some of these are valid 
F tests. However, when F test bias does occur it is almost always of a negative 
nature so that the conclusions stated are conservative ones. Positive F test 
bias may occur when the triple interaction is extant and when zero or one 
random variate is included in the experiment. 


Psychologists, mathematical statisticians, and others who utilize 
Statistical techniques have made extensive use of Latin square designs. 
Generally, it has been assumed that interactions must be nonextant for 
results to be adequate. Thus some have raised questions concerning the 
suitability of these techniques for psychological research in which one of the 
variables is subjects, since interactions involving subjects frequently occur. 
For example, MeNemar [6] maintains that if the interactions were not zero, 
obtained F values would not follow the F distribution and too many significant, 
results would occur, a positive F test bias. By MeNemar's argument, nonzero 
Interaction results in a residual term larger than the ordinary error component, 
but the combination of the two sources yields a residual smaller than the 
interaction that should be used as the denominator for the F test. Lindquist 


[5] also has maintained that the single Latin square daim mi а 
Useful in educational and psychological research. He argues tha ea а i 
ion of the other two factors and with 


effect is conf К » interacti 
ѕ confounded with the mter: i otar 
the triple interaction; Lindquist also stresses the ambiguous character of 
the resi ` 

sidual, quares, the adequacy of the 


hr, — А s of mean 5 ; 
, Through use of expected values ted. Thus Gourlay [2], using à 
~atin square designs may be better evaluated. 3 


i i . a valid application of the 
Variance component analysis, indicates that for 2: = же. 
Latin Square techniques interactions do not qe ca capri wee’ 
More, cont rary to MeNemar's assertion, he found ths 


' Values might result, a negative F test hias ference to two main types of 
Gourlay investigated this problem in reterent™ 

ата . 7 

nteractions that occur in psychology- 


ex." P 
TNow at Wilkes College, Wilkes Barre; Pa. 
369 


370 PSYCHOMETRIKA 


(i) Each individual or unit receives only one of several treat- 
ments and is represented by one measurement in the data. In this 
case interaction is between main effects. 

(1) Repeated measurements are made on the same individuals 
or groups. In this case earlier measurements may interact With those 
that follow. 


However, a more general and instructive procedure would be to determine 
the components of variance included within each mean square under four 
conditions: zero, one, two, and three random variates. The first condition 
corresponds to the fixed variate model, the last to the random variate model, 
and the others to the mixed model. This procedure would exhaust all possi- 
bilities in the single Latin square design and would permit an evaluation 
of the behavior of each test of significance. 

In obtaining the expected value of mean square components, the pro- 
cedure of Anderson and Bancroft [1], Greenwood [3], Kempthorne [4], and 
Tukey [9] or that used by MeNemar [7] and Mood [8] might be employed. 
"The two procedures differ in the components included in the expected value 
of mean square when random and fixed effects occur. The first procedure 
differs from the second by excluding from the random effect the variance 
due to the interaction of the fixed and random variates. The former procedure 
is favored by several intuitive arguments and will be used in this paper. 

Ina complete factorial design the rules for obtaining expected components 
may be stated simply [see 3]. The expected value of mean square for any 
source of variation is of (variance due to error) plus the c^ term having 
exactly the subseripts corresponding to the letters describing the source of 
variation. It further includes all o? terms which have these same subscripts, 
providing the remaining subscripts all refer to random effects. These rules 
may be extended to the Latin square design with few modifications, The 
expected value of mean square for each main effect would contain the triple 
interaction and the double interaction of the other two factors, in addition 
to those required as stated above. Likewise, the residual would contain е? and 
all interaction variances. However, the coefficients of all components except 
c? would not be the same as those in the complete factorial design sincs 
all levels of the three variables are not included in the Latin { 
These coefficients have been indicated in a recent 
thorne [10], which presents a generalized deriv: 
and à special case where only fixed effects are 
components are presented in Tables 1—4. 

This paper will attempt to demonstrate the possible defects inherent in 
the single Latin square tests of significance when interactions occur under 
the four conditions mentioned above. If the variable in the experiment 
includes all levels in the population of interest, then the effect is considered 


Square design. 
t paper by Wilk and Kemp- 
ation for Latin square designs 
involved. The coefficients and 


E 


— "fm MER. 


JOHN GAITO 371 


fixed. If the variable represents a random sampling from a population and 
the sample includes only a few levels of this variable, the effect is considered 
random. For each of the four conditions the effects of the presence of one 
or more interactions on tests of significance will be considered. 


Zero Random Variate Model 


In this model the levels of the three effects include the total population 
of levels and are considered as fixed. The paradigm for this model is shown 
in Table 1. The coefficients are either zero or one for all components except 
for the triple interaction and the three single effects. The coefficients for 
оз. in A, B, and C are the same, t — (2/1), but differ from that of the residual, 
t — (3/0. Each of the three single effects have a coefficient of t, the number 
of levels of each of the three main effects included in the experiment. The 
procedure below ean be followed by the reader by eliminating components 


that do not occur in each case. 
TABLE 1 


Zero Random Variates Model 


EMS (Coefficients of) 


Variates 
2 2 2 2 2 
ве2 eie" Фс” бас баь خ0‎ 9p 9а 
NENNEN M НЕНЫ 
A 1 (t-2)/t 1 0 0 0 0 t 
в 1 (t-2)/t 0 1 0 0 t 
с 1-20 9 0 1 t 
Residual 1 (t-3)/t 1 1 1 


i хрегітепї. 
Note - t is the number of levels of А, B, and C in the exper? 


Case 1. One Double Interaction | ‚ | ». 
The F tests of both variates included in the og damen кз Е y 
pissed innamuoh as an infatiop а ر‎ ш cone ires 
increase in the probability of а Type II error. The residua vs к 
The mean squares of two 0 


d ` ; interaction. 8 0 
ue to error and variance due енн d variance due to interaction. 


i $ Р гап 
ты сше ы eT] die te included in both the numerator 
iased result. The F test of 


he interaction variance would have · Sandi Y 
and denominator of the F test to achieve a tes То baa valid Е E 
the other variate is unbiased but is not 2 


372 PSYCHOMETRIKA 


more requirements than freedom from bias are imposed. The interactions 
contained in the mean squares of both the main effect and the residual must 
be random, normally distributed, and be a component that would be included 
in the mean square as indieated by the rule stated above. The P ratio in 
this instance is a ratio of two noncentral chi square statisties divided by 
their respective degrees of freedom, and the distribution depends upon the 
parameters of the fixed effect interaction. However, this F test does give à 
result above and beyond the interaction effect. 


Case 2. Triple Interaction 


The F test of each of the three variates contains a small positive bias 
inasmuch as the coefficient of c?,, for residual is less than the coefficient of 
this term in А, B, and C. 


Case 3. Two Double Interactions 


No matter which two are present, all three F tests are negatively biased 
because only one interaction term is contained in A, B, and C while two are 
present in the residual. 


Case 4. Three Double Interactions 


All F tests contain negative bias, but the bias is greater than in Case 3 
because three interaction terms are included in the residual and only one 
appears in each of the three main effects. 


Case 5. One Double Interaction and the Triple Interaction 


The results in this instance include both positive and negative biases. 
The F test of the variate not included in the first-order interaction has a 
small positive bias. The tests of the two variates included in the interaction 
will be biased, with the direction depending on the relative size of the variance 
due to the double and triple interactions. If the former is greater than (1/0) 
times the latter, then negative bias occurs; if less than (1/0), positive bias 
is present. 


Case 6. Two Double Interactions and the Triple Interaction 


Е tests tend to be negatively biased. However, positive bias may occur. 
The F test of the variate which is included in both double interactions is 
negatively biased if (1/0 is less than the sum of the two double inter- 
action variances. Positive bias occurs if the reverse is true. The F test of 
each of the other two effects is biased negatively if (1/2)o?,. is less than 
the variance due to the interaction of that effect with the effect which is 
common to both double interactions. For example, if AB, AC, and ABC 
are present, negative F test bias will occur in the test of A if (1 /t)c,, is 


abc 


less than 02, + a% ; in the test of B if (1/0) оа, is less than o, ; and in the test 


JOHN GAITO 373 


of C if (1/0)o3,, is less than o2. . Of course, positive bias results if the inequality 
is reversed. + 


Case Т. All Interactions 


All tests are negatively biased unless the sum of the variances for the 
three double interactions is less than the variance due to the triple interaction. 

In summary, in the zero random variate model all tests except one are 
biased if one or more interactions appear. The unbiased result occurs when 
one double interaction is extant and the test is on the variate not included 
in the interaction. However, because of the presence of the fixed effect inter- 
action in both the numerator and denominator, this unbiased test is not a 


valid F test. 


One Random Variate Model 


The paradigm is shown in Table 2, With the introduction of a random 
effect (C) more components appear, with c?, and of, included as а component 
for A and B, respectively. Furthermore, the coefficients of от change to 


t — (2/0) for the residual and to t — (1/0) for the random effect. The coefficients 
remain as ¢ — (2/0) for the two fixed effects. 


TABLE 2 


One Random Variate Model 


Манев EMS (Coefficients of) 
og? Gli" Oye" tee" Gab” сс? oy? б^ 
A 1 (t-2)/t 1 1 0 0 0 " 
E 1 (t-2)/t 1 1 0 0 t 
pe 1 (t-1)/t 0 0 1 t 
айы 1 eet} 1 1 
in Tables 2-4 are random main effects. 


* Variates with asterisk 


Ca. "action А А 
se 1. One Double Interac | а -—" — F test bias occurs in 
volv 


i i test of the random effect. However, 
tes Я as occurs Ш the tes die eti 
Eri wars psi poder as F. If the interaction шига ы i D 
effect, only the test of the random effect is negatively biased. › 


If the interaction in 


374 PSYCHOMETRIKA 


two unbiased tests only that of the variate interacting with the random effect 
is а valid Ё test. 8 


Case 2. Triple Interaction 


The F tests of the two fixed effects are free from bias, but are not distri- 
buted as F. In the test of the random effect positive bias occurs. 


Case 3. Two Double Interactions 


If one interaction involves а random effect and the other contains both 
fixed effects, all F tests are negatively biased. If both interactions involve 
the random effect, negative bias occurs in the test of the random effect. The 
test of the two fixed effects are unbiased but are not distributed as F. Again 
we have a ratio of two noncentral chi Square statisties whose distribution 
depends on the parameters of the fixed effects interaction included in the 
mean square. 


Case 4. Three Double Interactions 


All tests are negatively biased, but the bias is greater for the test of the 
random effect than for the tests of the two fixed effects because the random 
effect contains one interaction term while each of the fixed effects contains two. 


Case 5. One Double Interaction and the Triple Interaction 


If the fixed effects interact, then positive bias occurs in the test of the 
random effect but negative bias appears in the tests of the fixed effects, If 
the double interaction includes the random effect, then the tests of the two 
fixed effects are unbiased but not valid F tests. The test of the random effect 
is biased, with negative bias if c2. or тї, is greater than (1/t)e5,. , or with 
positive bias if the reverse occurs. 


Case 6. Two Double Interactions and the Triple Interaction 


If one of the double interactions involves the random ёз жый 4i 
other does not, the tests of the fixed effects are negatively biased; the test 
of the random effect is negatively biased if the variance of the interaction 
including the random effect is greater than (1/0)e2,, , otherwise positive 
bias oceurs. If both double interactions include the random effect, the F 
tests for the fixed effects are unbiased but are not valid F tests, The test of 
the random effect is negatively biased if o2, + o, is greater than (1/0 
otherwise positive bias occurs. 


was 
abc 3) 


Case 7. All Interactions 

Tests of the fixed effects are negatively biased; the test of the random 
effect is negatively biased if тш, + оз. is greater than (1/t)e?,. ; otherwise 
positive bias occurs. 


o —————— ی‎ 


JOHN GAITO 375 


In summary, with one random effect included in the experiment more 
tests are unbiased even though interactions occur and some tests are distri- 
buted as F. 

Two Random Variates Model 

The paradigm for this model is shown in Table 3. With the appearance 
of another random effect (B) the coefficients of озь. change to і — (1/1) for 
all four effects, and о?, is included for А and оз. for C. 

TABLE 3 


Two Random Variates Model 


EMS (Coefficients of) 


Variates 
d a бе” ы у о? Th g 
A 1 (t-1)/t 1 1 1 0 0 $ 
B* 1 (t-1)/t 1 1 0 0 
C* 1 «А 1 9 ш ш 
Residual 1 (t-1)/t H 1 1 


Case 1. One Double Interaction 


If the interaction involves the two random effects, all tests are unbiased 


but only the tests of the random effects are distributed as F. If the inter- 
action includes the fixed effect, two tests are unbiased, the test of the fixed 
effect and that of the random effect not included in the interaction. The test 
of the fixed effect is a valid F test but the P ratio for testing the random 
effect is not distributed as F. The random variate not included in the inter- 


action is negatively biased. 


Case 2. Triple Interaction 


" All F tests are unbiased but only 
distributed as F. 


the test of the fixed effect is valid and 


Case 3. Two Double Interactions 


If the interactions that are present ЫК 
of the random effects are negatively pape actio 
unbiased and a valid F test. If one of the interac 


involve the fixed effect, tests 
The test of the fixed effect is 
ns includes both random 


376 PSYCHOMETRIKA 


effects, the random effect included in both interactions is negatively biased 
but the tests of the other two effects are unbiased but not distributed as F. 


Case 4. Three Double Interactions 


The tests of both random effects have negative bias; the test of the fixed 
effect is unbiased but not a valid F test. 


Case 5. One Double Interaction and the Triple Interaction 


If the double interaction involves both random effects, all tests are 
unbiased but not distributed as F. If the double interaction contains the 
fixed effect, the test of the random effect involved in the interaction is nega- 
tively biased; the test of the two other effects are unbiased but only the test 
of the fixed effect is distributed as F. 


Case 6. Two Double Interactions and the T. riple Interaction. 


If the fixed effect is involved in both double interactions, the test of 
the fixed effect is a valid F test but the tests of the remaining effects are 
negatively biased. If only one interaction includes the fixed effect, the random 
effect that is included in both double interactions is negatively bi 


ased and 
the tests of the other effects are unbiased but are not valid F tests, 


Case т. All Interactions 


Tests of the random effects are negatively biased; the test of the fixed 
effect is unbiased but not a valid F test, because a component, (о?,) 
does not involve the fixed effect is included in the mean Square of t 
effect. 

In summary, if two random effects are included in the ex 
or more tests are unbiased; the test of the fixed effect is usu 
distributed as F. All biased tests are negative. 


which 
he fixed 


periment one 
ally unbiased and 


Three Random Variates Model 


'The paradigm is shown in Table 4, With all random effeets present, the 
coefficient for ozs is unity for all effects. All interaction components appear 
in the mean square of the three main effects as well as in the residual, indi- 
cating that all tests will be unbiased, but not necessarily valid, 


Case 1. Double Interaction 

Only the tests of the effects included in the interaction are valid. 
Case 2. Triple Interaction 

AII tests are valid. 
Case 9. Two Double Interactions 


Only the effect included in both interactions provides a valid F test 


—— — — € 
— —— - s 
————AGA-—————————joe— ра 


JOHN GAITO 377 


TABLE 4 


Three Random Variates Model 


EMS (Coefficients of) 


Vig 
2 2 2 2 3 2 2 
9e оре Se бе бар бе 95" Ln 
E 1 1 1 1 1 0 0 t 
B 1 1 1 1 1 û t 
[o 1 1 1 1 1 t 
Residual 1 1 1 1 1 


Case 4. All Double Interactions 
No tests are distributed as F. 


Case 5. One Double Interaction and the Triple Interaction 
The results are the same as in Case 1, i.e., only the tests of the effects 
included in the double interaction are valid. 


Case 6. Two Double Interactions and the Triple Interaction 


Only the test of the effect which is included in all interactions is a valid 


F test. 


Case 7. All Interactions 


No tests are valid. . . А 
Thus when all effects included in the experiment are random variates all 


tests are free from bias regardless of how many interactions are present. 
Furthermore, more valid tests occur than in the previous models. | 

In conclusion, it is interesting to note that as the number of random 
variates increases the number of unbiased F tests increases likewise until 
in the three random variates model all tests are free of bias. Paralleling this 
trend is an increasing number of valid F tests as well. In the first сам 
negative bias occurs most frequently but small positive bias is possil s е hen 
the triple interaction is present. In the third model all biases are сари I 
Thus as the number of random variates increases, more tots are qe j^ e 
to deviation from the traditional assumptions hen пе his ds 
zero. The most frequent occurrence d pure a bit random variates 
Square design probably is represented = - > Mes Diog e eres 
model in which subjects represents one oi * e . 


378 PSYCHOMETRIKA 


thought, less bias will occur with interactions present inasmuch as some tests 
will be unbiased; however, a fewer number will be valid. If one were to use 
the derivation technique of Mood and McNemar, more tests would be 
unbiased and valid in the one and two random variates models since more 
components are included in the expected value of mean square of the three 
main effects by this procedure. 

Furthermore, contrary to MeNemar's assertions, and in agreement with 
Gourlay, most of the tests have negative bias. This indicates that the prob- 
ability of a Type II error increases with these Latin Square designs for some 
tests of significance. Therefore, if the investigator reports a significant result 
in most of the tests within the first two models, or in any of the tests of the 
third model, he can be safe in stating that the probability of such an event 
is at or below the probability level chosen. However, if significance is not 
indicated by some tests when an interaction is present (and unknown to the 
investigator), the stated results are not as certain. It appears that prior to 
using a Latin square design the investigator should familiarize himself with 
the various cases and be aware of possible distortions which may occur, 

REFERENCES 

1] Anderson R. L. and Bancroft, T. A. 
Hill, 1952. 

2] Gourlay, N. F-test bias for experimental desi, 
metrika, 1955, 20, 273-287. 

3] Greenwood, J. A. Analysis of variance and components of variance factorial experi- 
ments. Unpublished paper, Bureau of Aeronautics, 1956 (revised). 

4] Kempthorne, O. The design and analysis of experiments. New York: Wiley, 1952. 


5] Lindquist, E. F. Design and analysis of experiments in psychology and education. New 
York: Houghton Mifflin, 1953. 


6] MeNemar, Q. On the use of Latin s 

398-401. 

7] MeNemar, Q. Psychological statistics. New York: Wiley, 1955. 

8] Mood, A. M. Introduction to the theory of statistics. New York: 

9] Tukey, J. W. Interaction in a row by column design. 
ton Univ., 1949. 

[10] Wilk, M. B. and Kempthorne, O. Non 

statist. Ass., 1957, 52, 218-236, 


Statistical theory in research. New York: MeGraw- 


gns of the Latin square type. Psycho- 


quares in psychology. Psychol. Bull., 1951, 48, 


McGraw-Hill, 1950. 
Memorandum Report 18, Prince- 


-additivities in a Latin Square design. J. Amer. 


Manuscript received 10/9/57 
Revised manuscript received 2/24/58 


PSYCHOMETRIKA—VOL. 23, NO. 4 
DECEMBER, 1958 


A MODIFICATION OF KENDALL'S TAU FOR MEASURING 
ASSOCIATION IN CONTINGENCY TABLES 


BERTRAM P. Karon AND IRVING E. ALEXANDER 
PRINCETON UNIVERSITY 


A coeflicient of association z' is described for a contingency table contain- 
ing data classified into two sets of ordered categories. Within each of the two 
sets the number of categories or the number of cases in each category need 


not be the same. т' = 4-1 for perfect positive association and has an expectation 
of 0 for chance association. In many cases т' also has —1 as а lower limit. The 
limitations of Kendall's ta and ть and Stuart’s т, аге discussed, as is the 
identity of these coefficients to +’ under certain conditions. Computational 


procedure for 7’ is given. 

Consider a contingency table consisting of two sets of ordered categories 
in which the numbers of categories within each set or the numbers of cases 
within each category are not identical. Such sets of ordered categories may 
be thought of as rankings with ties. It is well known that Kendall’s т [1] 
may be applied to contingency tables treating the categories as tied ranks. 
However, limitations in its application have recently been pointed out [3]. 

In the non-tied case, Kendall's coefficient has the following desirable 


properties. 
(i) If agreement between two rankings is perfect, т = +1. 
(ii) If inverse agreement between iwo rankings 18 perfect, T= —]. 
Gii) If there is chance agreement between the two rankings Е(т) = 0. 
(iv) The sample 7 is an unbiased estimate of the parametric value for 
the population from which the sample is randomly drawn. 
(v) The sampling properties of т are known and statistical test proce- 


dures are available. 


One way of defining 7 for the case where there are no ties 1s 


کے 

Q) T Б 
S is the number of pairs of objects which are in the same order in both rankings 
minus the number of pairs of objects in which the order is reversed; Smax is 


P : ainable by 8 when agreement is perfect. А 
the maximum value аар re n(n — 1)/2 pairs of objects the maximum 


i j there à А 
Since for n objects 1/2. Thus т may Бел даш 


attainable value of S is n(n — 
S 


(2) 77 Tam — D' 
o —1. 


which clearly varies from +1t 
379 


380 PSYCHOMETRIKA 


If there are one or more ties in either ranking, (1) is no longer identical 
with (2), since pairs which are tied are not counted in S. Tied objects are 
considered to have no order with respect to each other and therefore their 
order in the ranking in which they are tied can neither agree nor disagree 
with their order in the other ranking. Thus (2) can never reach +1 or —1. 
For such cases, Kendall defines (2) as ra , where suggested usage is confined 
to the situation in which the ties arise from an inability of the ranker to 
comply with instructions to carry out a complete ranking. 

Kendall then defines т, to account for all other cases of tied ranks, 


S 
@) = Bai — D = Ттт — 1) = С 


T may be written 


(4) Т= D 4-1), 
where t, is the number of objects tied for rank r of one of the rankings, and 
(5) U = $ Vu — 1), 


where 0, is the number of objects tied for rank s of the other ranking, For 
any rank ф in which there are no ties, 


(6) tlt —1 = 0 = и (и, — 1). 


Therefore the summations in (4) and (5) need be taken only over the tied 
ranks, and when no ties occur in either ranking, (3) reduces to (2). 

If the number of categories in each ranking and the number of cases in 
each category are the same in both rankings, т, can attain +1; if the number 
of categories is the same and the number of cases in each category is the 
same in reverse order, 7, can attain —1; if the number of categories is the 
same, and there is symmetry such that the number of cases in 
is the same in either order, 7, can attain both limits, 

Stuart [3] notes that even z, cannot attain in all cases the limits of +1 
and — 1. Stuart defines a coefficient 7, for the generalr X s contingency table, 


_ 8 
(7) Te = wm — 1)/(2m)' 


each eategory 


where m is the number of rows or the number of columns, whichever is 
smaller. If n is an even multiple of m, т, can attain the limits +1 ог —1 
vius. ВШ ha NUR AUS cited. do ur оаа diagonal, and the number of 
cases in each cell of that diagonal are equal. If n is not an even multiple of 
m then 7. cannot reach these limits, but, according to Stuart, 7, will approach 
them closely for large n. 


However, for small n (when n is not a multiple of m) the discrepancy 


BERTRAM P. KARON AND IRVING E. ALEXANDER 381 


is considerable, and т. is not satisfactory. More serious is the fact that even 
when n is a multiple of m, and even when т is large, т. cannot attain either 
4-1 or —1 except for the special case when the number of objects in each 
of the categories of the two rankings are arranged to allow the number of 
cases in each diagonal cell to be equal. Such special cases are not common. 

A simple coefficient for all sets of ordered categories, one that includes 
the desired properties, is readily available. Equation (1) may be used to 
define a coefficient 7’. It is evident that for all situations, т’ can reach a 
maximum of 4-1 only for as perfect an agreement as is attainable given the 
particular 7 X s table, and has an expected value of 0 when there is a chance 
agreement. It is also clear that Ta 7 , and Te are all equal to 7’ in the cases 
where they have the desired properties outlined for т. In cases where no 
formula is readily available for Sms: › it is simple to calculate numerically. 
All that is necessary is to fill in Шет Хз table under consideration as the 
observations would occur if association were perfect, and compute S in the 
same fashion as one computes the observed S. 

To illustrate the computation of 7 consider an example from Macht [2]. 
From the theory to be tested, 10 objects are ranked. The criterion is such 
that only the first 5 may be ranked, while the other 5 objects are placed 


into one ordered category. 


Theory 

1284567 8 9 10 

1 00100 00000 

2 10000 00000 

3 01000 00000 

Criterion 4 0 00100 0000 
5 0000 010000 
@gongieriida 


ent categories ordered according to the 
t the ranks according to the theory. 
below that of another observation 
an observation falls in à column 
is ranked lower by the theory. 
ht of another, it is ranked 


In this sample the rows repres 
criterion and the columns represen 
Therefore, if an observation falls in a row 
it is ranked lower by the criterion, and if А 
to the right of that of another observation, it 


t5n | nd to the rig 
If an observation is both below and { i 
lower by both the criterion and the theory, and the comparison between 


the two is in agreement on both rankings. If an observation falls below, but 


бегі igher he 
itd lower by the criterion but higher by t 
to the left of another, it 18 ranked the c ene 
theory, and the comparison be s in disagreement. 


tween the two 1 nt. If 
er 
tion falls in the same row or in the same column as another, - : tied pa a um 
es 1 ween 
the criterion or the theory, and therefore the es p = анн 
сап neither agree nor disagree. In order not to make the sà 
a 


382 PSYCHOMETRIKA 


more than once, each observation may be compared only with those which 
fall in rows lower than its own. Thus, one would compare the observations 


Now S may be computed, as indicated by Kendall, in the following 
manner, 


(i) For each cell of the table, count 
tions which fall in cells which a 
that cell. 

(ii) Subtract the number of obser 
and to the left of that cell. 

(iii) Do not count or subtract any observations Which lie directly 
or fall on the same horizontal line, or lie above the cell. 

(iv) Multiply the result for each cell by the number of cases in that cell, 

(v) Sum the results for each cell over the whole table. 


For this example, 


and record the number of observa- 
re both below and to the right of 


vations which fall in cells both below 


below, 


(8) 87(7—2) -8-7--6--(4— уу — 99. 
To determine S,,. , write а contir 


agency table showing perfect prediction of 
the criterion by the theory, 


Theory 
12 8 £4 5S T B 8 10 
I 1 900000000 
её Û 1 0 0 0 0 0000 
чїч 3 00100000009 
нана 4000100000 9 
5 00001000 0 0 
6 0000011 1 1-4 
Here, 
(9) Sm. = 9 + 8 +7 + 6 4+ 5 — 35. 
/ can vary from +1 to — 1, and for this example 
т! са 
PES 79. 
(10) т TS. 35^ .83. 


It is interesting to note the values reached by 7, 377b Te nd e^ ven 


the theory predicts the criterion perfectly, and when Consequently a satis- 
factory coefficient should equal 1. 


ж —— M — — 
—— S د‎ — 


BERTRAM P. KARON AND IRVING E. ALEXANDER 383 


(11) T, = .78; 
(12) m= 88; 
(13) m= 84. 
(14) 7’ = 1.00. 


7' seems the most satisfactory coefficient. It always has +1 as an upper 
limit indicating perfect association, and it always has 0 as its expected value 
when there is a chance association, since its numerator S has an expected 
value of 0. In addition, it will have —1 as a lower limit indicating perfect 
negative association for those situations in which Smia = — 8: . Where 
Sus = — San И an attainable lower limit of —1 for perfect negative asso- 
ciation is more important than an attainable upper limit of +1 for perfect 


positive association, — Smin may be used in the denominator of (1) in place 


Of Sine 5 А 
Tt should also be noted that а test of significance for the hypothesis of 


no association exists, since, for all tau statistics, the test of significance is 
based on the distribution of S and not of т. Thus the change in the denominator 
in defining 7’ affects only the measurement of association, and not the test 
of significance, The distribution of S under the hypothesis of no association 


has been discussed by Kendall [1]. 
REFERENCES 


x ] 1 ion methods. London: Griffin, 1948. 
[1] Kendall, M. G. Rank correlation m зү eiiim RI] n 


2] M: lication of the resu [ » 
Ыы seeded o occupational preference. Unpublished undergraduate thesis, 


Prine Tni 57. T М P 
в аар ан ын and comparison of strength of association in contingency 
art, A. 


tables. Biometrika, 1953, 40, 105-110. 


Manuscript received 10/15/5? _ 
Revised manuscript received 1/28/58 


BOOK REVIEWS 


GEORGE 8. WELSH AND W. Grant Danrsrnow. Basic Readings on the MMPI in Psychol- 

ogy and Medicine. Minneapolis: University of Minnesota Press, 1956. Pp. xvii + 

656. 

A collection of 66 papers and a 698-item bibliography provide a systematic com- 
pilation of representative information on the most prominent of all personality inventories. 
The collection is an excellent representation of papers interesting to the clinician, covering 
the basic validations of the instrument, the principal papers on fringe scales for dominance, 
ego-strength, and so on, and naturalistic reports on patient groups whose diagnoses range 
from alcoholism to cancer. The articles are skillfully edited to avoid duplication, and new 
papers written specially for the volume fill critical gaps. 

The reports of the scale development present a fascinating example of the research 
process in the hands of flexible investigators who can be honest with themselves. The first 
papers (ca. 1940) optimistically embarked on developing quantitative discriminant scales 
for psychiatric diagnosis. As the validities proved disappointing, trial and error was used 
in the hope of improvement. One hopeful attack introduced suppressor items, and later 
(1946) the K scale to correct for test-taking attitudes. Even the corrected scales were not 
very much in agreement with diagnoses, and from this date forward the papers increasingly 
deny that prediction of such criteria is or should be the function of the test. The most 
meaningful subsequent research is directed to connecting scales and patterns with general 


descriptive constructs. 


Considering the prominence the MMPI attained by virtue of its timely appearance, 


just as the war thrust new demands upon clinical psychology, it is incredible that its foun- 
dations are so shaky. The authors, though painstaking, established weights for items on 
tiny clinical samples. The sample N’s for original item selection were as follows: HS, 50; 
D, 50; РТ, 20; Hy, several samples; Ma, 24; PD, 100 plus a second sample of unstated N; 
ete. The fact that MMPI scales have worked at all on cross-validation conflicts sharply 
with the opinion of many authorities that samples for establishing item weights should be 
500 or larger. The authors freely criticized their own scales, repeatedly using such words as 
"disappointing" and “weak.” "Three of the main seales, indeed, were released with the 


intention of replacing them later. 6 "n " : " 
Hindsi ht indicates that the inventory was designed quite inefficiently. The T-F-? 
E : nse sets; the indiscriminate mixing of obvious 


attern invi naximal distortion by respo: ud : à mee 
ae cs Eee made the interpreter from capitalizing on the virtues of either type; 


the weighting of items which differentiate patient groups ie ер apr E por 
out of consideration items which differentiate patient — ph ersten out 
to undesirably high scale intercorrelations. [In а patient Án eig del Rin ine SÉ 
of 36 are between .60 and .86 (p. 259]. Moreover, cpu ohms ned v mall у the same 
between MMPI scales and the CPI scales derived by en pem BAIE Bap a 
items indicate that the basic MMPI scores extract only a small fraction ot the 1 


in the responses. 
The papers presented show co 


БР in 
ач шкы р ты а. e» лыр of studies suggest that clinicians 
can be placed in any particular interpret? 4 


i xperi nce 
i i ies with about 70 per cent success, in experiments where cha 
азып patienti VI ерп studies in this volume bear on the more pertinent 


success would be 50 per cent. N ig of eo are taken into account. 

u-—- s M А йш ыш model studies. Few studies contain рана) 

but Eas "i ps P afeat isis and conceptualizations which have закаран т. 
a 2 Е ie 

ne lack fale The investigations which seem most meritorious 1n terms o. 


385 


i { 1 be- 
istent but weak relations between MMPI and 
n indicate just how much confidence 


386 PSYCHOMETRIKA 


ness and informativeness are those by Black on college girls (p. 151), Peterson on pre- 
dicting hospitalization of outpatients (p. 407), Schiele and Brozek on experimental star- 
vation (p. 461), and Barron on ego-strength (p. 579). Two of the four are written for this 
volume. Some older studies show several faults in addition to neglect of base rates and 
use of small samples. In studies of group differences the significance tests outnumber the 
subjects as much as 20 to 1; under these conditions, probability values are meaningless. 
"Signs" are generally cross-validated in a suitable manner, but in one instance (p. 311) 
the signs are revised on the second sample and significance is then tested on the same cases 
for the revised signs. Elsewhere (p. 330) signs are elaimed to give a significant x? for selection 
of teachers, but correct application of a 2 x 2 table would show nonsignifieance. Outside 
of two papers by Gough, insufficient recognition is given to facade or “hello-goodbye” 
effects. 

In this book, as in other MMPI literature, great emphasis is placed on “patterns.” 

This term is applied indiscriminately, and the experimental designs rarely bear on the 
conclusions drawn. Conclusions of the form “Pt is higher than Sc in group X” should be 
tested by showing what proportion of group X has (Pt — Sc) > 0. The majority of studies 
make such interpretations from inspection of the mean difference, and compute significance 
by showing that one or both scores taken separately differ from the normal—a finding 
irrelevant to the conclusion. As in this example the so-called "patterns" are often no more 
than simple difference scores. The writer would advocate reserving the terms "pattern" 
and “configuration” for conjunctive, nonlinear formulas. Where nonlinear formulas are 
offered (for example in Welsh’s internalization ratio), the claim for configural validity 
ought to be supported by showing that this statistically awkward form is actually more 
valid than a suitable linear composite. The study by Little and Shneidman (p. 332), the 
most truly configural in the book, requires special criticism. A Q-sort of a single case was 
made by MMPI judges and Q-correlated with the sort made on the basis of the clinical 
case record. The average validity is said to be .67. Such an index is meaningless by itself, 
since selection of statements could swing the validity in either direction by almost any 
amount. In this case it reflects chiefly an impression of bad adjustment (four T-scores 
are over 90!) rather than a “personality pattern." The Q-sort for any other patient with 
manifest extreme disturbance would surely correlate highly with the criterion for this 
man. As a minimum, the authors should show how the correlation for A’s MMPI sort with 
A's criterion sort compares to correlations of MMPI sorts for patients B, C, and D with 
A’s criterion sort. 

One can only commend the compilers for providing so adequate a picture of MMPI 
history. A reader is left with respect for the progress we have made, particularly in discard- 
ing over-optimistic expectations. He should also be dismayed that so much conscientious 
effort on our most carefully planned personality inventory leaves us in our present state. 
Interpretation of MMPI profiles rests far more on the interpreter's "experience" than on 
validated principles. Within any typical intake group in a clinic or a student body, it seems 
quite doubtful that one can consistently make dependable inferences about an individual’s 
degree of disturbance or personality structure. It is fortunate that so many of these papers 
urge that the test be used only tentatively, as a supplement to other means of investigating 
the individual. 


versity of Illinois Тев J. Cnoxnacu 
Uni 


Pure Н. DuBois. Multivariate Correlational Analysis. New York: Harper & Brothers, 
1957. Pp. xv + 202. 


This book is concerned with the descriptive statistics of multiple linear regression 


tems. It appears to have been written chiefly for the Practicing research worker in the 
Sys p 


l—— Y 


BOOK REVIEWS 387 


social sciences who has a sound grasp of elementary descriptive statistics. Most of the 
topies are presented from a unifying viewpoint most clearly expressed in what the author 
calls the basic theorem—actually a lemma—of correlational analysis: 

Every value of a variate may be divided uniquely into two uncorrelated com- 
ponents: a portion perfectly correlate? with an outside variate and a portion uncor- 
related with the outside variate. 

Not only does this lemma serve to unify the discussion of various topics and procedures, 
but also to guide the research worker in their use. 

Valuable as the lemma is, as a unifying conception, the major focus of the book is 
upon the regression techniques, which explicitly exhibit the lemma in action. This is achieved 
by the author's development of Yule's partial covariance formulation into a general com- 
puting routine, based on the method of single division. 

In the first nine sections, multiple, partial, and part correlation are discussed in 
terms of the partial covariance concept and computing routine. With the appropriate 
adaptation of the routine, forward solutions yield directly multiple Ё, partial correlations 
and multiple part correlations. Corresponding regression 
coefficients may be found by completing the back solution. The general computing is 
simple, efficient, and includes systematic computational checking procedures. The author 
also presents an abbreviated method, and a Wherry-Doolittle type of procedure for selec- 
ing a subset of predictors. 

The procedures consist o 


of any order, multiple-partial, 


f systematic elimination of variance associated with each 
predictor. At each stage of the analysis partial variances and covariances for remaining 
"residual" variables are exhibited. This conception of correlational analysis is particularly 
advantageous in understanding correlations involving residuals, Funk eb pid за 
multiple-partial, and multiple part correlations. This is the clearest presen i of these 
topics seen by the reviewer. With the exception of partial correlation, it is not uncommon 


to see these matters dismissed in a brief paragraph containing a formula with no discussion 


of meaning or use; the reader is left to interpret juggled subscripts. The author deals with 


ев ай insightfully. aed E 
these subjects effectively and insightfull) be used to introduce controls statistically in 


Correlations involving residuals may х d 
situations where they cannot be introduced experimentally. As rep a kei 
part correlation, the author cites the frequently encountered problem whe 


differences on the criterion variable existing pray to gore aa с 
an aptitude predictor and post-training ena fi nal level but not from the aptitude 
of initial level of performance from the measurement o! ү е uh аа С 
predictor. The same logic applies to the use of the mu s a urify a criterion of variance 
of more than one predictor. Similar procedures may be used to purity 


irrelevant to the purpose of an investigation. xci 
Two sections are devoted to factor ee y aii 
matie discussion of this topic, but rather rela 


ysi i : zed in terms 
ysis and multiple regression systems, 2$ analy nent defined by variance common Wa 
is presented in which pivoting is done on a comp! 


1 first factor, 
at least two variables. Starting with uni-factor reference variables for the firs ‘ 


two-fac 'efert cond r si ds in terms 
f 1: factor, etc. the analysis proses: 
Wo-factor reference variables for the se Й ? puti x 


of partial covariances by an adaptation f the analysis. 
are given for picking factor-defining variables eire baden e be interpreted 
This method of factor analysis leads to 2 s are introduced in 


if sui eference variable: 
invariantly across equivalent samples if suitable к у чаа ts 
each analysis. The procedures lead to ® definition о 


i f initial communality estimation. 
i i nting the problem of 1n? à a 
Bor ccs vira, pem og pes of residual irr s wig m vs 
arthogonsl Pat ende and rotation may be introduced, 1 А 
al. Futher extra 


attempt is made to present à syste- 
demonstrated between factor anal- 
of partial covariances. A method 


388 PSYCHOMETRIKA 


versal applicability of the method is not claimed, it is a useful one when dealing with rather 
well understood domains in applied psychology. The computational effort is judged to be 
approximately equivalent to that of the centroid method. 

Readers with primary interest in statistics may prefer to start with a section entitled 
"Some Mathematical Considerations." 'This section develops and summarizes in math- 
ematical terms relations discussed in the first ten sections. However, it is not at all formi- 
dable for the less sophisticated reader. The section closes with a discussion of the relation- 
ship between the author's and Yule’s formulas, A number of topics receive very brief treat- 
ment in a section on inference and prediction. Included are corrections for attenuation, 
inferring by factorial extension correlations missing from a matrix, alienation and the 
standard error of estimate, shrinkage, the reliability of a residual variable, and the standard 
error of multiple correlations. A final section deals with the role of correlation analysis 
in social science research. The role is illustrated with six uses of the basic lemma. There 
follows a brief discussion of closed systems such as those generated by forced choice scores. 

In spite of its title, the scope of this book is limited by exclusion of many topies falling 
in the general domain of correlational analysis. One will not find any discussion of pattern 
analysis, multivariate models for norming and sealing, corrections of correlation matrices 
for effects of selection on more than one variable, canonical regression, image and radex 
models, discriminant functions, or analysis of dispersion. Within the scope chosen by the 
author, concepts and working procedures are clearly presented. Some readers may prefer 
some geometric presentation to supplement the verbal and algebraic approach used. The 
book can be used effectively as a text in courses with similar scope and level; however, 
many instructors will want to supplement the text with more examples of situations re- 
quiring the use of the procedures and with more thorough discussion of some of the topics 
presented. 

The format of the book includes a list of 36 references, a table of Ras a function of 
partial variance, that portion of a Square root table most useful in the general computing 
routine, and a computing chart with directions for its use. A useful glossary and name and 
subject indices complete the format. The publisher's use of linotype Caledonia makes an 
attractive and readable page. Typographical errors аге rare and of minor consequence, 
In summary, this book, though limited in scope, will be useful and clarifying 
readers. Regular readers of this journal will probably find much of the discussio: 
but may nevertheless profit from looking at familiar things from a refreshi 


for many 
n elementary, 
ng viewpoint, 
Wright Air Development Center 


; JOHN А. Creacer 
Lackland Air Force Base, Texas 


Joun B. MINER. Intelligence in the United Slates, New York: Springer Publishing Com- 
pany, 1957. Pp. xii + 180. 


About a third of American students below the tenth grade and over nine y 
could be doing work at college sophomore level or above, Such an assertion is arres 
in a period of concern over utilization of intellectual resources. The contributions offered 
in this survey are, first, information on the distribution of verbal ability in the Am 

opulation drawn from a cross-sectional national sample. Further, these novel da 
Б npared with distributions in а model society with а perfect correlation between 
pee and educational level or occupation. A statistical analysis of the reshuffling re- 
abi d in different subgroups yields information on the location of talent reserves, 

Why verbal ability? The study is a somewhat accidental outgrowth of standardiza- 
- pees rch requiring a fifteen-minute doorstep intelligence test. Not unwisely, the 
ache chose à vocabulary test as a close approximation in the time Available, Tt "would 
a 


ta are 
verbal 


quire 


BOOK REVIEWS 389 


be hard to dispute the author's argument that verbal ability is more crucial to success in 
our educational hierarchy than is any other special ability. His use of the term intelli- 
gence test to describe the measure is objectionable if the book has a nonprofessional audi- 
ence. Probably a vocabulary test is even more sensitive to educational effects than most 
general intelligence tests. Although Miner warns that environmental stimulus potential 
is crucial to the level of functioning at a given point, his own perference for the term 
intelligence test suggests an underplaying of this factor. 

This is the first cross-sectional national sample of ability available. It is unfortunate 
that the sampling technique used—a combination of cluster and quota sampling—will 
not permit establishing confidence limits. The original sample of 1500 deviated sufficiently 
from census characteristics to require additional quota-selected subjects to the number of 
a quarter of the final sample, suggesting that there was some bias in sample selection. As 
a result there are unknown effects on cell distributions even when marginal frequencies 
have been matched with the census, and unknown effects on the ability measure. Since 
more active, younger, more talkative respondents usually appear in quota samples one 
would guess that the estimates of ability are biased upward. There may be no differ- 
ential effects on subgroups in the samples, however, and for Miner’s analysis differential 
effects are crucial. 

Even had the sample been drawn with appropriate known-probability methods, the 
standard error would be systematically underestimated, since corrections for clustering 
were not used. In vitiation of these shortcomings one can note that the differences Miner 
found are generally substantiated by other studies of selected samples, such as the Army 
recruit studies and Wolfle’s school AGCT samples. 

Differences appeared in relation to education, occupation, rural-urban residence, 
geographical area, race, class identification, and religion. All these differences are compared 
to those found in other studies with coverage of relevant literature. One of the useful 
findings that appears repeatedly throughout the analyses is that the verbal knowledge 
distribution in housewives and retired persons does not deviate significantly from that of 
persons in the labor force, when educational differences are controlled in the older persons. 

Miner next proceeds to an analysis of discrepancies from the perfect correlation 
model, using two methods. The tenth percentile of vocabulary scores 1s taken as the mini- 
mum current entry level for any given school grade or occupational stratum, with ана 
pations divided into four skill strata. Those currently placed below that level уе T 
scores above the minimum entry point for a higher level are considered underplaced. 


H H i i i 1 
Since the correlation of verbal knowledge with both жр па cot. But À к 
there is inevi ion in the lower strata who are un i ч 
re is inevitably a large proportion in the h as in differential underplacement in 


is not interested in over-all underplacement so тие 
various subgroups. 

‘Thus, whites are found to be more often e 
and unskilled underplaced more than highly skille 
system, the less educated and working class persons 
skilled oceupation in stride. idem d 

Since EEE UE SE are not supplied, it is Liegen it ares 
ancies exist. A group would have less underplacement 1 is highly placed already; (c) the 
with vocabulary than in the total sample; (b) the gorp mà ys n both measures. Looking 
group is more homogeneous than the total sample and is pi es, and (b) to highly skilled 
at the vocabulary data it appears that (c) applies to negroes, ° | г 
persons, um pepe loce 

The second technique for n ени 
postulating a system in which th ides 
educational or occupational strati 


placed than negroes, 
hin the occupational 
ke a more highly 


ducationally under 
d workers. Witl 
could more often ta 


ciency in н 
i wee 

t correlation be 

ges in the numbers at € 


analysis of effi 
here is & perfec 
fication, with no chan 


390 PSYCHOMETRIKA 


On the conservative assumption of no changes in skill distribution, there would be de- 
motions as well as promotions to achieve the ideal system. Given these assumptions, the 
cut-off points are now raised because of the higher standards in a nonoverlapping strati- 
fication system. Since promotion is now available only to those at the top of their present 
level, a shift in the subgroup difference results occurs. 

Naturally, educational promotion rates are greater for the lower grades and demotion 
greater at higher levels because of ceiling effects. For instance, 3.9 per cent of the students 
below ninth grade would be promoted to college sophomore level or higher, and 21.7 per 
cent of the college students would be demoted in compensation. Differential results appear 
only for sex, with more girls being demoted, corroborating Wolfle’s finding of relatively 
higher grades for girls with AGCT controlled. 

In the occupational system, women would be more often promoted and men de- 
moted, possibly due to underemployment of women in the highest skill level. More de- 
motions would occur in the higher education and class levels than in the lower groups. 

The one apparently paradoxical result in these analyses is the finding that negroes 
are not relatively more underplaced, but on the contrary are in some cases less under- 
placed than whites. Here is one of the pitfalls of Miner’s method, which he notes but 
cannot surmount. He is comparing two measures both influenced by environmental stimu- 
lus potential and is trying to interpret discrepancies. As long as the chief contributor to 
variance in the verbal knowledge measure is individual ability, discrepancies can be sen- 
sibly interpreted and policy implications considered. Such is probably not the case with 
negro scores. In the educational analysis, the norms for a given grade are derived from 
pooling all schools. It is quite probable that standards for schools which negroes attend 
are sufficiently beneath the white averages to lower vocabulary exposure for negroes and 
thus reduce underplacement proportions. Anastasi’s New York studies suggest that addi- 
tional discrepancies in verbal exposure histories predate school experience, 

What does it mean then to say that adult negroes are by and large not capable of 
further education? The core of the contradiction lies in the contribution of education 
itself to the measure of capacity for education, and the failure to equate successfully past 
education. The same difficulty appears in analysis of adult educational potential when it 
is assumed that to be capable of a college education an adult with a grade school education 
should have the same verbal ability scores as students now in college. Miner emphasizes 
motivation and education in his treatment of group differences in verbal knowledge; he 
notes both in the introduction and summary that he is talking about current and not in 
any sense potential functioning. But his underplacement results are often phrased in a 
manner that obscures this qualification, which, from a public policy standpoint, is of 

at importance, 
gre Eos iai data is of more interest than the policy suggestions which he proposes with- 
out any extensive considerations of alternative factors (which do, and in some cases should, 
reduce the correlation of verbal ability and education or occupation), In hig educational 
utopia, grade placement would entirely ignore chronological age and be based on achieve- 
ment criteria. The problems raised in such a system are dismissed rather summarily al- 
though this kind of proposal has been controversial for some time, of perhaps greater 
interest from the view of public policy is the evidence by one criterion of considerable 

derusage educationally and occupationally of large numbers of persons both in the 
di force and in retirement. In Miner's society there is no reward for effort, no pay-off 
огир е independence training and high need achievement, and the Betas do not aspire 
fr en nd their abilities. He presents the hope that guidance counselors and company person- 
sud ped can selectively counteract these influences in the real world. 
ne 


Susan М. Ervin 
University of California, Berkeley 
пй 


y c——— 


BOOK REVIEWS 391 


Новккт R. Bvsu, Ковент P. ABELSON, AND Ray Hyman, Mathematics for Psychologists, 
Examples and Problems. New York: Social Science Research Council, 1956. Pp. 
iv + 86. 


This paperbound volume, prepared for the Social Science Research Council during 
the summer of 1954 under Bush’s direction, provides a fund of examples and problems 
illustrating mathematical applications in psychology. These are chosen and classified so 
that they can be used in four of the standard undergraduate mathematics courses. The 
book is not, and was not intended to be, cither a systematic treatise on mathematics or 
on the uses of mathematics in psychology; rather, it presents specific illustrations, drawn 
from the psychological literature, of applications of some of the more familiar mathematical 
topics. No really elaborate developments are presented in foto. Although its main use 
undoubtedly will be to supplement mathematies texts, it should also aid those preparing 


courses on mathematical psychology. 
Within their chosen framework, the authors have done an effective job. The coverage, 


although not intended to be exhaustive, is broad, the references to the literature are generous 
(124 items in the bibliography), and the writing is concise and clear. My only question is 
whether teachers of mathematics will not find the descriptions of the underlying psycho- 
logical problems too abbreviated. Probably they will be forced to read some of the research 
literature before they will feel reasonably confident in employing these examples; quite 
possibly this will serve a desirable long range purpose, and certainly with this volume in 
hand teachers of mathematics will know where to read. 

Each of the four main sections of the book is keyed to a standard mathematics 
text in the sense that each subsection corresponds to one or a few subsections of the text. 
For example, the calculus reference is Randolph and Kac’s, Analytic Geometry and Calculus. 
There are 91 examples classed under such headings as: inequalities, equation of a line, 
limits, derivatives, maxima and minima, definite integrals, exponential functions, Taylor's 
formula, and partial derivatives. As is true throughout, these examples are drawn largely 
from testing theory, psychophysics, physiological psychology, and learning. Kershner 
and Wileox's The Anatomy of Mathematics is the text for mathematical foundations. 
Thirty examples are given, illustrating ideas from the algebra of sets, cartesian products, 
relations, and functions. The third part on matrix algebra uses Aitken’s Determinants 
and Matrices and includes 65 illustrations of such matters as elementary matrix operations, 
determinants, solutions of linear equations, and linear independence. The final part, 
devoted to probability theory, refers to Feller's Ат Introduction to Probability Theory and 
its Applications. Beginning with sample spaces, the 67 examples range over such topics as 
binomial coefficients, statistical independence, random variables, expectation and variance, 
апа Markov chains. 

Considering the rather rapid development of mt 
only hope that the Social Science Research Council wil 
this useful problem list every five years or 80. R. Duncan Luca 


athematical psychology, one can 
ll see fit to supplement or revise 


Harvard University 


i 2 Wile; 
CALVIN 8. HALL AND GARDNER LINDZEY, Theories of Personality. New York: John Wiley 
& Sons, Inc., 1957. Pp. xi + 572. у 
ovide & “single source to which the student a is о 
1: Wi 
stu cing es d e Seid deri тст 
ч ios u , ‹ 
i n to summaries of major (10 ш. ith the advice and 
ks RR devious ч юа ll and Lindzey and as described by them wit! PE 
theories as identified by Hall an У and e theories, The titles of these mai 


criticism of leading protagonists of the respecti 


This book is designed to pr 


392 PSYCHOMETRIKA 


will indicate the range of content of this book: (II) Freud's Psychoanalytie Theory, (III) 
Jung's Analytic Theory, (IV) Social Psychologieal "Theories: Adler, Fromm, Horney, 
and Sullivan, (V) Murray's Personology, (VI) Lewin's Field Theory, (VII) Allport’s 
Psychology of the Individual, (VIII) Organismie Theory, (IX) Sheldon's Constitutional 
Psychology, (X) Factor Theories, (ХІ) Stimulus-Response Theory, (XII) Rogers’ Self 
Theory, and (XIII) Murphy’s Biosocial Theory. 

There can be no question but that Hall and Lindzey have performed a valuable 
service in preparing and making available this material. Heretofore there has been no 
comparable source, and a noticeable resultant one-sidedness in many individual psychol- 
ogists’ knowledge of personality theory. Furthermore, to the extent that the main chapters 
have been separately reviewed in advance of publication by those best able to say whether 
they truly reflect the theories they discuss, we are entitled to regard them as relatively 
authoritative—at least with respect to matters that are explicitly discussed. At the same 
time, since all of the chapters are the work of but two authors working in collaboration, 
there is a unity of presentation and consistency of style that is often lacking in books as 
eclectic as this one. 

A book such as this has many potential uses. Inevitably the requirements suggested 
by these uses are to some extent in conflict, and it is of interest to see which have won out. 

One potential use is as a text. From this point of view one looks for & book that is 


sies of a particular teacher's approach to the subject. All these qualities seem 

A second potential use is as a reference book by those already 
with the field. On this score one would wish to set higher standards t 
for completeness of coverage, both of ideas germane to the various 
surrounding literature. This is not a serious shortcoming in view 
regularly have appeared in the Annual Review of Psychology on p 
and related topics. 

A third potential use is in the manner of an original contribution, 
of view it looks as if the authors have consciously refused to m 
opportunity. In the opening and closing chapters, and in the 
main chapter, Hall and Lindzey come face-to-face with the tho: 
personality theory. Admittedly, no stand they might have take. 


to be present. 
Somewhat familiar 
han those achieved 
theories and of the 
of the chapters that 
ersonality assessment 


From this point 
ake the most of their best 
concluding section of each 


Under these circumstances perhaps the best Strategy for any would-be personality 
theorist is a perfect illustration of classic scientific method, namely, the isolation of in- 
stances in which two personality theories lead to divergent Predictions, followed by the 
unbiased collection and rigorous analysis of pertinent. empirical data, A thoughtful reading 
of Hall and Lindzey's book is bound to suggest many such instances, 


D. R. SAUNDERS 
Educational Testing Service 


G. Herpan, Language as Choice and Chance, Groningen, Netherlands: Р. Noordhoff, 
` 1956. Pp. xiii + 356. 


Although the publishers announce that this book “aims at Providing a systematic 


BOOK REVIEWS 393 


exposition of the quantitative structure of language," the author is more modest and in 
reface narrows this down to what he conceives of as the “four main branches of literary 
: Stylolinguistics, Statistical Linguistics, Information Theory, and Linguistic 
Duality.” The term “literary statistics" is perhaps most apt to describe the contents of the 
book. The frequent qualification of topics as “linguistic” is in the broad sense of pertaining 
to language and does not imply that the book makes use of the methods or results of 
modern descriptive linguistics. With the exception of some material on the distribution of 
phoneme occurrences, the linguistic data are counts from written sources of the occurrence 
of words, syllables, letters, and similar elements not regarded as primary in study of struc- 
tural linguistics. Some of these data are the work of the author and are new; many have 
been collected from published works. 

The statistical problems the author approaches have some intrinsic interest and are 
doubtless important for the critical study of literary material, but his handling of them is 
disappointing. For example, to the problem of disputed authorship he brings only abbre- 
viated versions of the techniques used by С. U. Yule in A Statistical Study of Literary 
Vocabulary. These are based on comparing the distribution of nouns by frequency of 
occurrenee in the disputed works with similar distributions from the known works of the 
contended authors. Yule noted that the means and variances of these (J-shaped) distribu- 
tions increased with sample size (number of running words) and that the variances exceeded 
the means. By analogy with accident statistics, Yule decided he was dealing with com- 
pound Poisson distributions (which are descriptive of the distribution of individuals by 
number of accidents when risk of accidents differs among individuals). He then showed 
that the coefficient of variation for the component distributions with respect to their 
average mean should be independent of sample size and а useful statistic for comparing 
the noun distributions. When he computed a function of this statistic for the known works 
of the authors in question, Thomas à Kempis and Gerson, Yule found its value for the 
disputed work, De Imitatione Christi, more in accord with the works of à Kempis. Also, 
contingency tables between the number of nouns used 1, 2,3,..., n times in the Imitatione 
and 1, 2, 3, ... , m times in the known works of these authors showed higher association 
with the works of à Kempis. ч 

Yule's study is highly empirical, аз he readily admitted, and not entirely complete 
or satisfactory from a statistical point of view. It was a work of his later years and explicitly 
left its more formidable problems for others to solve. G. Herdan presumedly took up ne 
work at this point. He did little more, however, than note that Yule’s statistic was only 
slightly different from the coefficient of variation computed directly from the iem 
of nouns. He used this simpler statistic to rework Yule's data and examine iig eo 
material. From a linguistic point of view, Herdan's acceptance of Yule's cones i Fs 
approach to resolution of disputed authorship seems somewhat me It b» eiit. Pen 
for example, that the range of vocabulary used by 2 writer depends ar d be compared 
of the subject matter discussed. There is danger that the disputed work wi than those of 
with works more similar in content to those of one of the up aet ex approach 
the other, particularly when the material for comparison as m ү idum 
would probably be strongly biased by this error. On the agi se A level, the results 
carried out in terms of classes of constructions at the Lapeer age the athe Баке 
might reflect the formality of the language iH gddition ps ES id be easier to match works 
influence of content should be much reduced. In gear : € Dons in terms of larger 
by level of formality than by content, suggesting thai p: 


ven i is is not the case, it 
constructions would be more reliable than word counts. oam on those which 
ваков Jikely thak the maneno pana е an polen at the syntatical level 
H 1s , » 
determine choice of larger constructions. : hed on the word level: 
: ons reached on Se 
would provide an independent test t — this book is devoted to exhibiting many 


pa сысы 
The section on “statistical linguistics 


his 


394 PSYCHOMETRIKA 


examples of frequency distributions of what the author describes as “certain linguistic 
forms," e.g., phonemes, letters, word length in terms of number of letters and syllables, 
grammatical forms (classical parts of speech), and metrical units in Latin and Greek 
hexameter. The author is much impressed by the constancy of these distributions in 
samples from different tests, and calls it the “basic law of linguistic communication and 
realization." He makes no attempt to account for the distributions in terms of any model, 
except to note that the preferred positions of metrical divisions after 3, 5, and 8 syllables 
in Greek hexameter corresponds to successive terms in that delight of mathematical 
recreations, the Fibbonaci series (Northrop, E. P. Riddles in Mathematics: A Book of 
Paradozes. New York: Van Nostrand, 1944), 

The introduction to the section on information theory consists of the repetition of 
Shannon’s main results for the discrete case. A species of information measure is then used 
to characterize a novel sort of relationship between a text in one language and its translation 
into another. Each word is entered in a two-fold classification according to the number of 
syllables in the original, and the number of syllables in the corresponding word of the trans- 
lation. This leads to certain difficulties from free translation, or from German words which 
correspond to a half a dozen or more in English, or from the habit in Slavic languages not 
to use articles and to omit the verb “to be." These are resolved by appropriate rules, and 
the dependencies in the resulting contingency tables are charact 


language turns out to be most similar to English, then German 


1 › ‚ Czech, and Russian least 
like English. This, the author says, “reflects the varying degrees of relationship” between 


the languages. What this relationship might mean in terms of the historical affinities of 


The fifth section, “Linguistic Duality,” is devoted to various opposing tendencies 
in language, such as the fact that words of greater frequency tend to be of lesser length, 
that words pronounced the same tend to acquire several meanin gs, and concepts or meanin gs 
tend to acquire synonymous words, and that freedom and constraint, choice and chance, 
contribute to determine the sequence of symbols in written language. This section contains 
data showing that when Chinese ideographs are classified by the number of brush strokes 
as they are in dictionaries, the number of characters having 1, 2,3,...,27 brush strokes 
is distributed much like the number of genera of insects having 1, 2, 3, ..., n species. 
The author construes this as evidence that the Chinese dictionary is organized on “taxo- 
nomic" principles. If this is the case, then the distribution of words in written text by their 
frequency of occurrence, of cities by population, and of incomes by size, reflect taxonomic 
principles also, since the many examples collected by Zipf show that they are also dis- 
tributed in this way. Herbert Simon has demonstrated that these distributions can be 
regarded to arise from a stochastic model which yields the limiting form of Yule's well- 
known distribution of genera by the number of species, This model is of so great generality 
that we should attach no more profound significance to the fact that certain phenomena 
conform to it than we do to conformity of other phenomena to the normal distribution. 

The final section of this work is a review of large sample statistical methods, through 
product moment correlation. The treatment is strictly Pearsonian, including the sample 
size for the denominator of the variance estimator and such topics as the critical ratio 

test for binomial proportions, and mean square contingency. Since word samples in literary 
statistics are usually very large, the author feels these methods are adequate, 

Much of the interpretation of the statistical results obtained in this book is vague 
said dins tendencies toward the metaphysical in the section on Linguistic Duality. In the 
opinion of the reviewer, this book cannot be considered a significant, contribution to the 

7 of language. 
study ' R. DARRELL Bock 
University of North Carolina 


л, ——— > ee 


Minutes of the 
1958 ANNUAL BUSINESS MEETING 
of the 
PSYCHOMETRIC SOCIETY 
| The regular Annual Meeting of the Psychometric Society was held in Washington, D. C. 
on Tuesday, September 2, 1958. President Frederick Mosteller called the meeting to order 
at 3:05 P. M. 
The minutes of the previous Annual Meeting were read and approved. 
On a ballot for the election of two new members of the Council of Directors, 
x Dr. Lloyd С. Humphreys and Dr. Ardie Lubin were elected for a term of three years, ending 


in 1961. 


Dr. John E. Milholland reported for the Membership Committee. The Membership 
Committee nominated 44 persons as full members and 21 as student members. 


]t was moved, seconded, and passed that the 21 persons named below be elected as 


student members. 


Vladimir V. Almendinger, Jr., Brighton 35, Massachusetts 
Richard F. Arnold, East Lansing, Michigan 
Mrs. Joan Hauser Bailey, Van Nuys, California 
Mark Philip Bryden, Montreal, Canada 
Cherry Ann Clark, South Pasadena, California 
Bart B. Cobb, Jr., San Antonio, Texas 
Kern William Dickman, Urbana, Illinois 
Howard J. Douglas, Lafayette, Indiana 
Jean Engler, University of North Carolina, Chapel Hill, North Carolina 
x Morton P. Friedman, Columbus, Ohio 
Arthur H. Hill, University of Minnesota, Minneapolis, Minnesota 
George G. Karas, West Lafayette, Indiana 
Mrs. Ann S. McColskey, Volusia County Health Unit, Daytona Beach, Florida 
Kazuo Nihira, Los Angeles, California 
Melvin R. Novick, Chicago, Illinois 
LeRoy A. Olson, Madison, Wisconsin 
Erich P. Prien, Jr., Cleveland, Ohio 
Marvin Snider, Ann Arbor, Michigan 
Douglas K. Spiegel, Chapel Hill, North Carolina 
Edward E. Ware, Urbana, Illinois 
Leonard Wevrick, University of Illinois, Urbana, Illinois 


ir individuals. 
It was moved, seconded, and passed to elect as full members the following 44 individua 


Joel W. Ager, Jr., Pleasant Ridge, Michigan 
Edward F. Alf, Jr., San Diego, California 
Eivind Henri Baade, Oslo, Norway a 
Rudolph G. Berkhouse, Alexandria, 
Allan Birnbaum, Columbia University, 
Robert F. Boldt, Department of the Army, 
ry Bornstein, Arlington, Virginia А К А 
ат ра ааган төп University, St. Louis, Missouri 


Joan H. Cantor, Peabody College, Nashville, T p canion 
Edward Galvin Garterctte, University of Саани, 


Virginia 
New York, New York 
AGO, Washington, D. C. 


395 


396 PSYCHOMETRIKA 


w^ 


Robert E. Chandler, Detroit, Michigan 

Kenneth E. Clark, University of Minnesota, Mineapolis, Minnesota 

William V. Clemans, National Board of Medical Examiners, Philadelphia, Pa. 

Dorothy M. Clendenen, The Psychological Corporation, New York, New York 

Adriaan D. deGroot, Amsterdam, Holland 

Edmund Emil Dudek, U.S. Naval Personnel Research, San Diego, California 

Wendell R. Garner, Johns Hopkins University, Baltimore, Maryland 

Sten Henrysson, Stockholm, Sweden 

Peter A. Holman, Downey, California 

Robert Anthony Jones, Redondo Beach, California 

Herbert Kaizer, IBM Corporation, Lexington, Massachusetts 

D. James Klett, Perry Point, Maryland 

Eiichi Komiyama, Kitaku, Tokyo, Japan 

Samuel S. Komorita, Vanderbilt University, Nashville, Tennessee 

R. Duncan Luce, Cambridge, Massachusetts 

Winton Howard Manning, Washington University, St. Louis, Missouri 

Philip R. Merrifield, Long Beach, California 

Jerome L. Myers, University of Massachusetts, Amherst, Massachusetts 

Paul DeLay Nelson, Naval Air Station, Corpus Christi, Texas 

Mrs. Nageswari Rajaratnam, Urbana, Illinois 

Olav Reiersol, Institutt for Matematiske Tag, Oslo, Norway 

James H. Ricks, Jr., The Psychological Corporation, New York, New York 

Bryan Boroughs Sargent, Ш, Knoxville, Tennessee 

Paul A. Schwarz, American Institute for Research, Pittsburgh, Pennsylvania 

William S. Schwarzbek, General Electric Company, New York, New York Г 

Lee B. Sechrest, Northwestern University, Evanston, Illinois 

Robert Seibel, Peekskill New York 

Maynard W. Shelly, Columbus, Ohio 

Roger Newland Shepard, Bell Telephone Laboratories, Murray Hill, New Jersey 

Walter R. Stellwagon, Syracuse, New York 
Patrick Suppes, Stanford University, Stanford, California 
James M. Vanderplas, Washington University, St. .Louis, Missouri 
Charles L. Walter, University of Tennessee, Knoxville, Tennessee * 
Lawrence К. Waters, Ohio State University, Columbus, Ohio 

It was moved, seconded, and passed that the Membership Committee be thanked for 

their excellent work, 
Dr. Irving Lorge reported for the Committee on the Relations between the Psychometric 

Society and the Psychometric Corporation. A copy of this report is attached, 

seconded, and passed that the report of this committee be accepted with thanks 

committee, consisting of Dr. Lorge as Chairman, Dr. Clyde H. Coombs : 

Stalnaker, be continued. | 


It was moved, 
» and that the 
and Dr. John M. 


President Mosteller asked for a show of fanda: o. 
ауысы. DES POS лыы future relationships between the Psychometric Society and the 
age ا‎ ile first alternative was to continue the present organizational 
Loos val linee ri s te enlarge the Corporation so that all members of the 4 
Society become members of the Corporation. The third proposal was to proceed with the 

mmendations of the Committee and to take steps to incorporate the Society and to dissolve 
the Corporation. It appeared to be the sense of the Meeting that the third Possibility was the 
= t desirable. It was moved and seconded that the Committee on the Relations between the 
Peychometrie Society and the Psychometric Corporation be instructed to continue to tan 
Hie tado to incorporating the Society and the dissolving of the Corporation. The motion 
a unanimously. 


indicate the Sentiments of the members 


PSYCHOMETRIKA 397 


It was moved and seconded that the President appoint Dr. William B. Schrader, the 
Treasurer of the Society, as a member of the Committee on Relations between the Psycho- 
metric Society and the Psychometric Corporation. Motion passed. 


The motion was made, seconded, and passed that up to $500 be made available for the 
work of the Committee on Relations between the Psychometric Society and the Psychometric 
Corporation. Half of this money may be used for expenses in connection with submitting the 
draft of a new constitution of the Society for the approval of the membership and half for 
expenses in connection with incorporating the Society. Motion passed. 


Dr. J. E. Keith Smith reported for the Program Committee. Of ten abstracts of papers 
submitted for consideration for presentation at the Annual Meeting, nine were accepted. Three 
symposia were scheduled, one of which was a proposal submitted by a member and two were 
developed by the Program Committee. It was moved and seconded that the report be accepted 


with thanks. Motion passed. 


]t was moved, seconded, and passed that the Council of Directors obtain information on 
the affiliation of the Psychometric Society with the American Psychological Association. 


The Secretary's report was presented by Dr. Philip H. DuBois. He stated that approxi- 
mately 20 members, not members of the American Psychological Association, took advantage 
of the system of registering for the Psychometric Society Meeting by mail. The Secretary's 
report was accepted with thanks. 


It was moved and seconded that the minutes of the Annual Business Meeting be published 
hereafter in Psychometrika. Motion passed. 


It was moved and seconded that a committee be appointed to study special membership 
categories, including foreign membership and life membership for older individuals. Motion 


passed. 


]t was moved and seconded that an auditing committee be appointed to audit the books of 


the Treasurer. Motion passed. 


1t was moved and seconded that the President and President-Elect appoint a committee 
to explore the possibility of special events to celebrate the 25th anniversary of the Society in 
the year 1960. It is understood that this special committee will not replace the regular Program 
Committee, but will supplement it. Motion passed. 

The report of the Treasurer was presented by Dr. Schrader. А copy is attached. It was 
accepted with thanks. 


ted that Dr. Frederic Lord had 


i sta 
mittee ning October 1, 


Dr. b ing for the Elections Com а 
т. Coombs, reporting for Tee panied of one year begin 


been elected President of the Psychometric Society 
1958. 


The meeting was adjourned at 4:05 P. M. 
Philip H. DuBois 
Secretary 


Report of the 
COMMITTEE ON THE RELATIONS BETWEEN 
THE PSYCHOMETRIC SOCIETY AND THE PSYCHOMETRIC CORPORATION 


September 2, 1958 


jm The Committee on the Relations between the Psychometric Society and the Psycho- 


metric Corporation reported to the Psychometric Corporation that the most advisable procedure 
for affecting the merger of the Psychometric Corporation with the Psychometric Society would 
be to dissolve the Corporation, turn its assets over to the Psychometric Society and to incor- 


porate the Society. The Corporation accepted the Committee report and continued Irving Lorge 
and John Stalnaker as its representatives on the Joint Committee. 


Constitution of the Psycho- 
r of the assets of the 


Psychometric Corporation 5 of Psychometrika for an 


interim period of not more 


3. The Committee has investigated the Beneral procedures f. 
Society as a non-profit organization. It believes, 
counsel to determine in what state such incorporat 
retain counsel to affect the incorporat: 
tion which also will be tax-free. 


or incorporation of the 
however, that it will need to have legal 

ion would be most desirable, and, then, to 
ion of the Psychometric Society as a 


non-profit organiza- 


4 
4. The Committee, therefore, recommends that steps be taken to adopt а new 
Constitution for the Psychometric Society according to the provisions of Article XII of the 
current constitution which requires 
a) previous approval "by a three-fourths vote of the entire membership of the 
Council of Directors and the Editorial Council as a whole" and 
v 
b) the subsequent approval "by a vote of two-thirds of the Members Present at any Y 
Annual Meeting or by a two-thirds vote of all Members responding by vote to a 
mailed ballot, " 
5. The Committee recommends that the Treasurer of the Society and of the Corporation 
be added to the joint committee to facilitate the preparation of the Constitution for vote by the 
Council of Directors and the Editorial Council, to submit the Constitution for a mailed vote of 
the membership, and to facilitate the designation of the State for incorporation of the Society. 
4 
6. The Committee recommended a bud 


get of $250. 00 for the Preparation 
of Directors and the Editorial Coun 
250.00 for legal counsel in the actu 
udgeted by the Psychometric Societ 


and mailing 
cil and sub- 
al incorpor- 
y- 


of the proposed new Constitution to the Council 
sequently to the membership, and a budget of $ 
ation of the Society. The entire sum is to be b 


7 The Committee has given consideration to a nu 
itable memorial to Professor L. L. Thurstone. It will 
с нвтаган ов at the next annual meeting of the Society. 
CO: 


mber of proposals to develop a 
solicit further Suggestions for 


Respectfully submitted, 
Irving Lorge 


Clyde Coombs 
John Stalnaker 


398 


» 


x 


PSYCHOMETRIC SOCIETY 


Statement of Receipts and Disbursements for Fiscal Year 
Ended June 30, 1958 


RECEIPTS (Dues) 


Year Members Student Members 
1958 554 48 
1957 41 9 
1956 3 
98 57 
$4,414.00 
Received with Dues for Corporation Publications 192. 60 
Overpayments .26 
Partial Payments 4.60 
Total Receipts $4,611. 46 
DISBURSEMENTS 
Psychometric Corporation (90% of dues) $3,976. 74 
Psychometric Corporation (Publications) 192. 60 
Stationery and Postage 175. 70 
Secretarial Services 89.27 
Bank Charges 8.24 
Telephone 9.96 
Total Disbursements $4,452. 51 
BALANCE 
Balance, June 30, 1957 $1, 186. 25 
Receipts, 1957-58 _4, 611, 46. 
5, 797. 71 
Disbursements, 1957-58 _4, 452.51 
$1, 345. 20 


Balance, June 30, 1958 


у a а 


PSYCHOMETRIC CORPORATION 


Statement of Receipts and Disbursements for Fiscal Year 
Ended June 30, 1958 


RECEIPTS 
Subscriptions (less agency discounts) $5, 656. 00 
Psychometric Society (90% of dues) 3, 976. 74 
Sale of Back Issues (less discounts) 428.05 
Sale of Monographs 5.8 (less discounts) 230. 60 
Interest on Savings Accounts 271.25 
Reprints à 662. 83 
Net overpayments 28.31 
DISBURSEMENTS 

Printing and Mailing Psychometrika 

Volume 22, No. 2, through 23, No. 1 $6,873.85 
Reprints 310.61 
Stipend of Managing Editor (7/1/57--6/30/58) 750.00 
Stipend of Assistant Managing Editor (7/1/51--6/30/58) 500. 00 
Stipend of Treasurer (7/1/57.-6/30/58) 250. 00 
Secretarial Services: Editorial Office 800. 00 
Secretarial Services: Business Office 112.90 
Stationery and Postage 185, 52 
Mailing Back Issues and Monographs 86. 28 
Refunds 37.80 
Miscellaneous 


31.85 
$9, 938.81 


BALANCE AND RESERVES 


Balance, June 30, 1957 


$5,927. 77 
Reserve Funds, June 30, 1957 
Englewood Savings and Loan Assn. 
Englewood, Colorado К a 3, 500. 00 
Metropolitan Savings and Loan Assn. Ы ч 
Los Angeles, California 3,500. 00 
Total 12,927.77 
Receipts, 1957-58 11,253. 78 
Sum 23,181.55 
Disbursements, 1957.58 " 9.938. 81 


Remainder $14,212.74 
Balance, June 30, 1958 
Reserve Funds, June 30, 1958 $ 7,242.74 
Englewood Savings and Loan Assn. 
Englewood, Colorado 
Metropolitan Savings and Loan Assn. 
Los Angeles, California 
Total, Balance and Reserve Funds 


3, 500. 00 


OBLIGA TIONS 


Estimated cost of Psychometrika, Vol. 23, Nos, 2.4 
Printing and Mailing 


Stipends (7/1/58--12/31/58) $5, 200. oe 
SEE Services 259.00 


$6, 400. 00 


L 
BALANCE AND RESERVES, LESS OBLIGATIONS $7, 842.74 


400 


INDEX FOR VOLUME 23 


Adams, Ernest (with S. Messick). An axiomatic formulation and generalization of successive 
intervals scaling. 355-368. . ` " p ч 

Alexander, Irving E. (with B. P. Karon). А modification of Kendall's tau for measuring 
association in contingency tables. 379-383. | А 

Atkinson, Richard C. A Markov model for discrimination learning. 309-322. н 

Audley, R. J. The inclusion of response times within a stochastic description of the learning 
behavior of individual subjects. 25-31. i , 

Bock, R. Darrell. Remarks on the test of significance for the method of paired comparisons. 
323-334. S 

Bock, R. Darrell. Review of “G. Herdan, Language as Choice and Chance. petes L4 

Brownless, Vera T. A retest method of studying partial knowledge and other fac 
influencing item response. 67-73. А , n 

Carroll, John B. Review of “Henry Quastler (Ed.), Information Theory in Psychology. 
275-276. ^ ; 54568 

Collier, Raymond O., Jr. Analysis of variance for correlated observations. 223- d 3 — 

Creager, John A. General resolution of correlation matrices into components and its 
zation in multiple and partial regression. 1-8. _ Я 

Cronbach, Lee Ј. Review of “С. 8. Welsh апа W. С. Dahlstrom (Eds.), Basic Readings 
on the MMPI in Psychology and Medicine." 384-385. А | — 

Cureton, Edward E. The average Spearman rank criterion correlation when ties are p Е 
271-272. А E 

Ervin, Susan M. Review of “J. B. Miner, Intelligence in the United States. eden 

Feldt, Leonard S. A comparison of the precision of three experimental designs emp 
a concomitant variable. 335-353. | jüsution:of 

Feldt, Leonard S. (with M. W. Mahmoud). Power function charts for specifica 
sample size in analysis of variance. 201-210. — 

Fruchter, Benjamin (with E. Novak). A comparative study of three methods of rotation. 
211-221. 

Gaito, John. The single Latin square design in psychological research, 369-378. 

Garside, R. F. The measurement of function fluctuation. 75-83. = {каз үй 

Gerard, Harold В. (with H. N. Shapiro) Determining the degree of inconsistency 
set of paired comparisons. 33-10. , 

Glaser, Robert. Review of “J. S. Bruner, J. J. Goodnow, and G. A. Austin, A Study of 
Thinking.” 184-186, 2 

Green, Bert F., Jr. Review of “L. J. Cronbach and G. C. Gleser, Psychological Tests and 
Personnel Decisions," 179-180. " * 

Gulliksen, Harold. Comparatal dispersion, a measure of accuracy of judgment. ris IE 

Gulliksen, Harold (with J. W, Tukey). Reliability for the law of comparative judg . 
95-110. " 

Guttman, Louis. To what extent can communalities reduce rank? 297-308. 

Hoffman, Paul J. Predetermination of test weights. 85-92. | А = 

Kaiser, Henry F. The varimax criterion for analytic rotation in factor analysis. 187 ede 

Karon, Bertram P. (with I. E. Alexander). А modification of Kendall's tau for measuring 
association in contingency tables. 379-383. . ч А 

Keats, John А. (with V. Т. Brownless). А retest method of studying partial know ledge 
and other factors influencing item responses. 67-73. > 

Lord, Frederic M. Some Жаы опы between Guttman's principal components of scale analysis 


and other psychometric theory. 291-296. 
401 


402 PSYCHOMETRIKA 
Luce, R. Duncan. Review оѓ “В. R. Bush, R. P. Abelson, and R. Hyman, Mathematics 
or Psychologists, Examples and Problems." 391. г - 
jos Samuel B. The Kuder-Richardson formula (21) as a split-half coefficient, and | 
remarks on its basic assumption. 267-270. | 
MacLean, Angus G. Properties of the item score matrix, 47-53. К ion in latent 
McHugh, Richard B. Note on “Efficient estimation and local identification in la 
class analysis," 273-274. 
MeNemar, Quinn. Attenuation and interaction, 259-265. ification of 
Mahmoud, Moharram W. (with L. S. Feldt) Power function charts for specifi 
sample size in analysis of variance. 201-210. + ation of suc- 
Messick, Samuel (with E. Adams). An axiomatic formulation and generalizatio: 
cessive intervals scaling. 355-368, m 
Morin, Robert E. Review of “J. К. Adams, Basic Statistical Concepts." 180-182. 
Morin, Robert E. Review of “Н. E. Garrett, Elementary Statistics.” 182-183. 
Mosteller, Frederick, 'The mystery of the missing corpus. 279-289. гітах, а 
Neuhaus, Jack. О, (with C. Wrigley and D. R. Saunders). Application of the qua м. 


method of rotation to Thurstone’s primary mental abilities study. 151-170. 


: ы tion. 
Novak, Edwin (with B, Fruchter). A comparative study of three methods of rota 
211-221. > 


Psychometrika. Rules for preparation of manuscripts. 93-94, lity.” | 
Saunders, David В. Review of “C. 8. Hall and G, Lindzey, T'heories of Personali E | 
391-392. - 


Saunders, David В. (with C. Wrigley and J. O. Neuhaus), Application of the quartimax 
method of rotation to Thurston 1 


Sawrey, William L. A distinction b 
171-177. 

Shapiro, Harold N. (with H. В. Gerard). Determinin 
set of paired comparisons. 33-46, 


Sokal, Robert R. Thurstone’s analytical method for simple structure and a mass modifi- | 
cation thereof. 237-957. Ы 


Sutcliffe, J. P. Error of measuremen 

Tucker, Ledyard R. Determination 
19-23. 

Tucker, Ledyard R. An inter-batter 

Tukey, John W. (with Н. Gullikse: 
95-110. 

Ward, Joe H., Jr. The counseling assignment problem. 55-65. ч 

Wittenborn, J. R. Review of “W. G, Cochran and G. Cox, Experimental Designs. (2nd ed.) 
277-278. 

Woods, Charles L. Review of “W. A, Wallis and Н. V. Roberts, Statistics: A New Ap- 4 

roach." 183-184. 2 

Wrigley, Charles (with D. R. Saunders and J. О. Neuhaus). Application of the quartiriax 

method of rotation to Thurstone's primary mental abilities study. 151-170. d 


e’s primary mental abilities study. 151-170. as 
etween exact and approximate nonparametric metho 


g the degree of inconsistency in ® 


$ S Е 
t and the sensitivity of a test of significance, 9-17. е 
of parameters of a functional relation by factor analysis. 


y method of factor analysis. 111-136. 
п). Reliability for the law of comparative judgment. 


ERRATUM MN 
In Ward, Joe H., Jr., The counseling assignment problem. Psychometrika, 1958, 23, 
› 
55-65. 
In the center of page 64 the constant following the equals sign should be 57,360 
n the c 


rather than 57,630 


E 


m 


